HyperAIHyperAI

Command Palette

Search for a command to run...

5 months ago

PaCE: Unified Multi-modal Dialogue Pre-training with Progressive and Compositional Experts

Yunshui Li; Binyuan Hui; ZhiChao Yin; Min Yang; Fei Huang; Yongbin Li

PaCE: Unified Multi-modal Dialogue Pre-training with Progressive and Compositional Experts

Abstract

Perceiving multi-modal information and fulfilling dialogues with humans is a long-term goal of artificial intelligence. Pre-training is commonly regarded as an effective approach for multi-modal dialogue. However, due to the limited availability of multi-modal dialogue data, there is still scarce research on multi-modal dialogue pre-training. Yet another intriguing challenge emerges from the encompassing nature of multi-modal dialogue, which involves various modalities and tasks. Moreover, new forms of tasks may arise at unpredictable points in the future. Hence, it is essential for designed multi-modal dialogue models to possess sufficient flexibility to adapt to such scenarios. This paper proposes \textbf{PaCE}, a unified, structured, compositional multi-modal dialogue pre-training framework. It utilizes a combination of several fundamental experts to accommodate multiple dialogue-related tasks and can be pre-trained using limited dialogue and extensive non-dialogue multi-modal data. Furthermore, we propose a progressive training method where old experts from the past can assist new experts, facilitating the expansion of their capabilities. Experimental results demonstrate that PaCE achieves state-of-the-art results on eight multi-modal dialog benchmarks.

Code Repositories

Benchmarks

BenchmarkMethodologyMetrics
dialogue-state-tracking-on-mmconvPaCE
Categorical Accuracy: 92.2
Non-Categorical Accuracy: 43.4
Overall: 39.2
dialogue-state-tracking-on-simmc2-0PaCE
Act F1: 97.1
Slot F1: 87.0
image-retrieval-on-photochatPaCE
R1: 15.2
R@10: 49.6
R@5: 36.7
Sum(R@1,5,10): 101.5
multimodal-intent-recognition-on-mmdialogPaCE
F1: 77.6
multimodal-intent-recognition-on-photochatPaCE
F1: 63.8
Precision: 63.3
Recall: 68
response-generation-on-mmconvPaCE
BLEU: 22
Comb.: 44.7
Inform: 34.5
Success: 13.9
response-generation-on-simmc2-0PaCE
BLEU: 34.1
text-retrieval-on-image-chatPaCE
R@1: 51.9
R@5: 76.8
Sum(R@1,5): 128.7

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
PaCE: Unified Multi-modal Dialogue Pre-training with Progressive and Compositional Experts | Papers | HyperAI