HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

Scaling Autoregressive Models for Content-Rich Text-to-Image Generation

Scaling Autoregressive Models for Content-Rich Text-to-Image Generation

Abstract

We present the Pathways Autoregressive Text-to-Image (Parti) model, which generates high-fidelity photorealistic images and supports content-rich synthesis involving complex compositions and world knowledge. Parti treats text-to-image generation as a sequence-to-sequence modeling problem, akin to machine translation, with sequences of image tokens as the target outputs rather than text tokens in another language. This strategy can naturally tap into the rich body of prior work on large language models, which have seen continued advances in capabilities and performance through scaling data and model sizes. Our approach is simple: First, Parti uses a Transformer-based image tokenizer, ViT-VQGAN, to encode images as sequences of discrete tokens. Second, we achieve consistent quality improvements by scaling the encoder-decoder Transformer model up to 20B parameters, with a new state-of-the-art zero-shot FID score of 7.23 and finetuned FID score of 3.22 on MS-COCO. Our detailed analysis on Localized Narratives as well as PartiPrompts (P2), a new holistic benchmark of over 1600 English prompts, demonstrate the effectiveness of Parti across a wide variety of categories and difficulty aspects. We also explore and highlight limitations of our models in order to define and exemplify key areas of focus for further improvements. See https://parti.research.google/ for high-resolution images.

Benchmarks

BenchmarkMethodologyMetrics
text-to-image-generation-on-coco-1Parti
FID: 7.23
text-to-image-generation-on-coco-1Parti Finetuned
FID: 3.22
text-to-image-generation-on-laion-cocoParti
FID: 15.97
text-to-image-generation-on-laion-cocoParti Finetuned
FID: 8.39

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Scaling Autoregressive Models for Content-Rich Text-to-Image Generation | Papers | HyperAI