HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

MaskBit: Embedding-free Image Generation via Bit Tokens

Mark Weber Lijun Yu Qihang Yu Xueqing Deng Xiaohui Shen Daniel Cremers Liang-Chieh Chen

MaskBit: Embedding-free Image Generation via Bit Tokens

Abstract

Masked transformer models for class-conditional image generation have becomea compelling alternative to diffusion models. Typically comprising two stages -an initial VQGAN model for transitioning between latent space and image space,and a subsequent Transformer model for image generation within latent space -these frameworks offer promising avenues for image synthesis. In this study, wepresent two primary contributions: Firstly, an empirical and systematicexamination of VQGANs, leading to a modernized VQGAN. Secondly, a novelembedding-free generation network operating directly on bit tokens - a binaryquantized representation of tokens with rich semantics. The first contributionfurnishes a transparent, reproducible, and high-performing VQGAN model,enhancing accessibility and matching the performance of currentstate-of-the-art methods while revealing previously undisclosed details. Thesecond contribution demonstrates that embedding-free image generation using bittokens achieves a new state-of-the-art FID of 1.52 on the ImageNet 256x256benchmark, with a compact generator model of mere 305M parameters.

Code Repositories

markweberdev/maskbit
Official
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
image-generation-on-imagenet-256x256MaskBit
FID: 1.52
image-reconstruction-on-imagenetMaskBit (16x16)
FID: 1.66

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
MaskBit: Embedding-free Image Generation via Bit Tokens | Papers | HyperAI