HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

Autoregressive Image Generation using Residual Quantization

Doyup Lee Chiheon Kim Saehoon Kim Minsu Cho Wook-Shin Han

Autoregressive Image Generation using Residual Quantization

Abstract

For autoregressive (AR) modeling of high-resolution images, vector quantization (VQ) represents an image as a sequence of discrete codes. A short sequence length is important for an AR model to reduce its computational costs to consider long-range interactions of codes. However, we postulate that previous VQ cannot shorten the code sequence and generate high-fidelity images together in terms of the rate-distortion trade-off. In this study, we propose the two-stage framework, which consists of Residual-Quantized VAE (RQ-VAE) and RQ-Transformer, to effectively generate high-resolution images. Given a fixed codebook size, RQ-VAE can precisely approximate a feature map of an image and represent the image as a stacked map of discrete codes. Then, RQ-Transformer learns to predict the quantized feature vector at the next position by predicting the next stack of codes. Thanks to the precise approximation of RQ-VAE, we can represent a 256$\times$256 image as 8$\times$8 resolution of the feature map, and RQ-Transformer can efficiently reduce the computational costs. Consequently, our framework outperforms the existing AR models on various benchmarks of unconditional and conditional image generation. Our approach also has a significantly faster sampling speed than previous AR models to generate high-quality images.

Code Repositories

ai-forever/movqgan
pytorch
Mentioned in GitHub
kakaobrain/rq-vae-transformer
Official
pytorch
Mentioned in GitHub
lucidrains/magvit2-pytorch
pytorch
Mentioned in GitHub
archinetai/bitcodes-pytorch
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
image-generation-on-imagenet-256x256RQ-Transformer
FID: 3.83
image-reconstruction-on-imagenetRQ-VAE (8x8x16)
FID: 1.83
text-to-image-generation-on-conceptualRQ-Transformer
FID: 12.33

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Autoregressive Image Generation using Residual Quantization | Papers | HyperAI