HyperAIHyperAI

Command Palette

Search for a command to run...

5 months ago

Omni-SMoLA: Boosting Generalist Multimodal Models with Soft Mixture of Low-rank Experts

Wu Jialin ; Hu Xia ; Wang Yaqing ; Pang Bo ; Soricut Radu

Omni-SMoLA: Boosting Generalist Multimodal Models with Soft Mixture of
  Low-rank Experts

Abstract

Large multi-modal models (LMMs) exhibit remarkable performance acrossnumerous tasks. However, generalist LMMs often suffer from performancedegradation when tuned over a large collection of tasks. Recent researchsuggests that Mixture of Experts (MoE) architectures are useful for instructiontuning, but for LMMs of parameter size around O(50-100B), the prohibitive costof replicating and storing the expert models severely limits the number ofexperts we can use. We propose Omni-SMoLA, an architecture that uses the SoftMoE approach to (softly) mix many multimodal low rank experts, and avoidsintroducing a significant number of new parameters compared to conventional MoEmodels. The core intuition here is that the large model provides a foundationalbackbone, while different lightweight experts residually learn specializedknowledge, either per-modality or multimodally. Extensive experimentsdemonstrate that the SMoLA approach helps improve the generalist performanceacross a broad range of generative vision-and-language tasks, achieving newSoTA generalist performance that often matches or outperforms singlespecialized LMM baselines, as well as new SoTA specialist performance.

Benchmarks

BenchmarkMethodologyMetrics
chart-question-answering-on-chartqaSMoLA-PaLI-X Generalist Model
1:1 Accuracy: 73.8
chart-question-answering-on-chartqaSMoLA-PaLI-X Specialist Model
1:1 Accuracy: 74.6
object-counting-on-tallyqa-complexSMoLA-PaLI-X Specialist
Accuracy: 77.1
object-counting-on-tallyqa-complexSMoLA-PaLI-X Generalist (0 shot)
Accuracy: 70.7
object-counting-on-tallyqa-simpleSMoLA-PaLI-X Generalist (0 shot)
Accuracy: 83.3
object-counting-on-tallyqa-simpleSMoLA-PaLI-X Specialist
Accuracy: 86.3
visual-question-answering-on-a-okvqaSMoLA-PaLI-X Specialist Model
DA VQA Score: 70.55
MC Accuracy: 83.75
visual-question-answering-on-docvqa-testSMoLA-PaLI-X Generalist
ANLS: 0.906
visual-question-answering-on-docvqa-testSMoLA-PaLI-X Specialist
ANLS: 0.908
visual-question-answering-vqa-onSMoLA-PaLI-X Specialist
ANLS: 66.2
visual-question-answering-vqa-onSMoLA-PaLI-X Generalist
ANLS: 65.6
visual-question-answering-vqa-on-ai2dSMoLA-PaLI-X Specialist Model
EM: 82.5
visual-question-answering-vqa-on-ai2dSMoLA-PaLI-X Generalist Model
EM: 81.4

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Omni-SMoLA: Boosting Generalist Multimodal Models with Soft Mixture of Low-rank Experts | Papers | HyperAI