HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

MURAL: Multimodal, Multitask Retrieval Across Languages

Aashi Jain Mandy Guo Krishna Srinivasan Ting Chen Sneha Kudugunta Chao Jia Yinfei Yang Jason Baldridge

MURAL: Multimodal, Multitask Retrieval Across Languages

Abstract

Both image-caption pairs and translation pairs provide the means to learn deep representations of and connections between languages. We use both types of pairs in MURAL (MUltimodal, MUltitask Representations Across Languages), a dual encoder that solves two tasks: 1) image-text matching and 2) translation pair matching. By incorporating billions of translation pairs, MURAL extends ALIGN (Jia et al. PMLR'21)--a state-of-the-art dual encoder learned from 1.8 billion noisy image-text pairs. When using the same encoders, MURAL's performance matches or exceeds ALIGN's cross-modal retrieval performance on well-resourced languages across several datasets. More importantly, it considerably improves performance on under-resourced languages, showing that text-text learning can overcome a paucity of image-caption examples for these languages. On the Wikipedia Image-Text dataset, for example, MURAL-base improves zero-shot mean recall by 8.1% on average for eight under-resourced languages and by 6.8% on average when fine-tuning. We additionally show that MURAL's text representations cluster not only with respect to genealogical connections but also based on areal linguistics, such as the Balkan Sprachbund.

Benchmarks

BenchmarkMethodologyMetrics
semantic-textual-similarity-on-cxcMURAL-large
avg ± std: 74.1 ± 0.4
semantic-textual-similarity-on-cxcALIGN-L2
avg ± std: 72.9 ± 0.4
semantic-textual-similarity-on-cxcDE-T2T+I2T
avg ± std: 74.5 ± 0.4

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
MURAL: Multimodal, Multitask Retrieval Across Languages | Papers | HyperAI