3 months ago

Autoregressive Image Generation using Residual Quantization

Doyup Lee Chiheon Kim Saehoon Kim Minsu Cho Wook-Shin Han

Abstract

For autoregressive (AR) modeling of high-resolution images, vector quantization (VQ) represents an image as a sequence of discrete codes. A short sequence length is important for an AR model to reduce its computational costs to consider long-range interactions of codes. However, we postulate that previous VQ cannot shorten the code sequence and generate high-fidelity images together in terms of the rate-distortion trade-off. In this study, we propose the two-stage framework, which consists of Residual-Quantized VAE (RQ-VAE) and RQ-Transformer, to effectively generate high-resolution images. Given a fixed codebook size, RQ-VAE can precisely approximate a feature map of an image and represent the image as a stacked map of discrete codes. Then, RQ-Transformer learns to predict the quantized feature vector at the next position by predicting the next stack of codes. Thanks to the precise approximation of RQ-VAE, we can represent a 256$\times$256 image as 8$\times$8 resolution of the feature map, and RQ-Transformer can efficiently reduce the computational costs. Consequently, our framework outperforms the existing AR models on various benchmarks of unconditional and conditional image generation. Our approach also has a significantly faster sampling speed than previous AR models to generate high-quality images.

Code Repositories

ai-forever/movqgan

pytorch

Mentioned in GitHub

kakaobrain/rq-vae-transformer

Official

pytorch

Mentioned in GitHub

lucidrains/magvit2-pytorch

pytorch

Mentioned in GitHub

archinetai/bitcodes-pytorch

pytorch

Mentioned in GitHub

Benchmarks

Benchmark	Methodology	Metrics
image-generation-on-imagenet-256x256	RQ-Transformer	FID: 3.83
image-reconstruction-on-imagenet	RQ-VAE (8x8x16)	FID: 1.83
text-to-image-generation-on-conceptual	RQ-Transformer	FID: 12.33

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started

Hyper Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Command Palette

Autoregressive Image Generation using Residual Quantization

Doyup Lee Chiheon Kim Saehoon Kim Minsu Cho Wook-Shin Han

Abstract

Code Repositories

Benchmarks

Build AI with AI

Hyper Newsletters