6 months ago

Lili Yu Bowen Shi Ramakanth Pasunuru Benjamin Muller Olga Golovneva Tianlu Wang Arun Babu Binh Tang Brian Karrer Shelly Sheynin

Abstract

We present CM3Leon (pronounced "Chameleon"), a retrieval-augmented, token-based, decoder-only multi-modal language model capable of generating and infilling both text and images. CM3Leon uses the CM3 multi-modal architecture but additionally shows the extreme benefits of scaling up and tuning on more diverse instruction-style data. It is the first multi-modal model trained with a recipe adapted from text-only language models, including a large-scale retrieval-augmented pre-training stage and a second multi-task supervised fine-tuning (SFT) stage. It is also a general-purpose model that can do both text-to-image and image-to-text generation, allowing us to introduce self-contained contrastive decoding methods that produce high-quality outputs. Extensive experiments demonstrate that this recipe is highly effective for multi-modal models. CM3Leon achieves state-of-the-art performance in text-to-image generation with 5x less training compute than comparable methods (zero-shot MS-COCO FID of 4.88). After SFT, CM3Leon can also demonstrate unprecedented levels of controllability in tasks ranging from language-guided image editing to image-controlled generation and segmentation.

Source PDF

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

6 months ago

Lili Yu Bowen Shi Ramakanth Pasunuru Benjamin Muller Olga Golovneva Tianlu Wang Arun Babu Binh Tang Brian Karrer Shelly Sheynin

Abstract

Source PDF

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

6 months ago

Lili Yu Bowen Shi Ramakanth Pasunuru Benjamin Muller Olga Golovneva Tianlu Wang Arun Babu Binh Tang Brian Karrer Shelly Sheynin

Abstract

Source PDF

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Command Palette

Scaling Autoregressive Multi-Modal Models: Pretraining and Instruction Tuning

Lili Yu Bowen Shi Ramakanth Pasunuru Benjamin Muller Olga Golovneva Tianlu Wang Arun Babu Binh Tang Brian Karrer Shelly Sheynin17 more

Abstract

Build AI with AI

HyperAI Newsletters

Command Palette

Scaling Autoregressive Multi-Modal Models: Pretraining and Instruction Tuning

Lili Yu Bowen Shi Ramakanth Pasunuru Benjamin Muller Olga Golovneva Tianlu Wang Arun Babu Binh Tang Brian Karrer Shelly Sheynin17 more

Abstract

Build AI with AI

HyperAI Newsletters

Command Palette

Scaling Autoregressive Multi-Modal Models: Pretraining and Instruction Tuning

Lili Yu Bowen Shi Ramakanth Pasunuru Benjamin Muller Olga Golovneva Tianlu Wang Arun Babu Binh Tang Brian Karrer Shelly Sheynin17 more

Abstract

Build AI with AI

HyperAI Newsletters

Lili Yu Bowen Shi Ramakanth Pasunuru Benjamin Muller Olga Golovneva Tianlu Wang Arun Babu Binh Tang Brian Karrer Shelly Sheynin

Lili Yu Bowen Shi Ramakanth Pasunuru Benjamin Muller Olga Golovneva Tianlu Wang Arun Babu Binh Tang Brian Karrer Shelly Sheynin

Lili Yu Bowen Shi Ramakanth Pasunuru Benjamin Muller Olga Golovneva Tianlu Wang Arun Babu Binh Tang Brian Karrer Shelly Sheynin