HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

Multi-Modal Open-Domain Dialogue

Kurt Shuster Eric Michael Smith Da Ju Jason Weston

Multi-Modal Open-Domain Dialogue

Abstract

Recent work in open-domain conversational agents has demonstrated that significant improvements in model engagingness and humanness metrics can be achieved via massive scaling in both pre-training data and model size (Adiwardana et al., 2020; Roller et al., 2020). However, if we want to build agents with human-like abilities, we must expand beyond handling just text. A particularly important topic is the ability to see images and communicate about what is perceived. With the goal of engaging humans in multi-modal dialogue, we investigate combining components from state-of-the-art open-domain dialogue agents with those from state-of-the-art vision models. We study incorporating different image fusion schemes and domain-adaptive pre-training and fine-tuning strategies, and show that our best resulting model outperforms strong existing models in multi-modal dialogue while simultaneously performing as well as its predecessor (text-only) BlenderBot (Roller et al., 2020) in text-based conversation. We additionally investigate and incorporate safety components in our final model, and show that such efforts do not diminish model performance with respect to engagingness metrics.

Benchmarks

BenchmarkMethodologyMetrics
visual-dialog-on-blendedskilltalkMulti-Modal BlenderBot
BLEU-4: 1
F1: 17.8
ROUGE-L: 19.3
visual-dialog-on-convai2Multi-Modal BlenderBot
BLEU-4: 1.1
F1: 18.4
ROUGE-L: 22.6
visual-dialog-on-empatheticdialoguesMulti-Modal BlenderBot
BLEU-4: 1.5
F1: 19.2
ROUGE-L: 24.5
visual-dialog-on-image-chatMulti-Modal BlenderBot
BLEU-4: 40
F1: 13.1
ROUGE-L: 18
visual-dialog-on-wizard-of-wikipediaMulti-Modal BlenderBot
BLEU-4: 2.2
F1: 18.6
ROUGE-L: 17.4

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Multi-Modal Open-Domain Dialogue | Papers | HyperAI