HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

Composing Ensembles of Pre-trained Models via Iterative Consensus

Shuang Li Yilun Du Joshua B. Tenenbaum Antonio Torralba Igor Mordatch

Composing Ensembles of Pre-trained Models via Iterative Consensus

Abstract

Large pre-trained models exhibit distinct and complementary capabilities dependent on the data they are trained on. Language models such as GPT-3 are capable of textual reasoning but cannot understand visual information, while vision models such as DALL-E can generate photorealistic photos but fail to understand complex language descriptions. In this work, we propose a unified framework for composing ensembles of different pre-trained models -- combining the strengths of each individual model to solve various multimodal problems in a zero-shot manner. We use pre-trained models as "generators" or "scorers" and compose them via closed-loop iterative consensus optimization. The generator constructs proposals and the scorers iteratively provide feedback to refine the generated result. Such closed-loop communication enables models to correct errors caused by other models, significantly boosting performance on downstream tasks, e.g. improving accuracy on grade school math problems by 7.5%, without requiring any model finetuning. We demonstrate that consensus achieved by an ensemble of scorers outperforms the feedback of a single scorer, by leveraging the strengths of each expert model. Results show that the proposed method can be used as a general purpose framework for a wide range of zero-shot multimodal tasks, such as image generation, video question answering, mathematical reasoning, and robotic manipulation. Project page: https://energy-based-model.github.io/composing-pretrained-models.

Benchmarks

BenchmarkMethodologyMetrics
arithmetic-reasoning-on-gsm8kGPT-2-Medium 355M + question-solution classifier (BS=1)
Accuracy: 16.8
Parameters (Billion): 0.355
arithmetic-reasoning-on-gsm8kGPT-2-Medium 355M (fine-tuned, BS=5)
Accuracy: 18.3
Parameters (Billion): 0.355
arithmetic-reasoning-on-gsm8kGPT-2-Medium 355M (BS=5)
Accuracy: 12.2
Parameters (Billion): 0.355
arithmetic-reasoning-on-gsm8kGPT-2-Medium 355M + question-solution classifier (BS=5)
Accuracy: 20.8
Parameters (Billion): 0.355
image-generation-on-imagenet-64x64GLIDE + CLS-FREE
FID: 29.219
Inception Score: 25.926
KID: 5.325
image-generation-on-imagenet-64x64GLIDE +CLS
KID: 7.952
image-generation-on-imagenet-64x64GLIDE + CLIP
FID: 30.462
Inception Score: 25.017
KID: 6.174
image-generation-on-imagenet-64x64GLIDE + CLS
FID: 30.871
Inception Score: 22.077
image-generation-on-imagenet-64x64GLIDE + CLIP + CLS + CLS-FREE
FID: 29.184
Inception Score: 34.952
KID: 3.766
video-question-answering-on-activitynet-qaGPT-2 + CLIP-14 + CLIP-multilingual (Zero-Shot)
Accuracy: 61.2
video-question-answering-on-activitynet-qaGPT-2 + CLIP-32 (Zero-Shot)
Accuracy: 58.4

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp