HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

Scalable Diffusion Models with Transformers

William Peebles Saining Xie

Scalable Diffusion Models with Transformers

Abstract

We explore a new class of diffusion models based on the transformer architecture. We train latent diffusion models of images, replacing the commonly-used U-Net backbone with a transformer that operates on latent patches. We analyze the scalability of our Diffusion Transformers (DiTs) through the lens of forward pass complexity as measured by Gflops. We find that DiTs with higher Gflops -- through increased transformer depth/width or increased number of input tokens -- consistently have lower FID. In addition to possessing good scalability properties, our largest DiT-XL/2 models outperform all prior diffusion models on the class-conditional ImageNet 512x512 and 256x256 benchmarks, achieving a state-of-the-art FID of 2.27 on the latter.

Code Repositories

senmaoy/RAT-Diffusion
pytorch
Mentioned in GitHub
facebookresearch/DiT
Official
pytorch
Mentioned in GitHub
milmor/diffusion-transformer
pytorch
Mentioned in GitHub
FineDiffusion/FineDiffusion
pytorch
Mentioned in GitHub
nyu-systems/grendel-gs
pytorch
Mentioned in GitHub
locuslab/get
pytorch
Mentioned in GitHub
chuanyangjin/fast-dit
pytorch
Mentioned in GitHub
hustvl/dig
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
image-generation-on-imagenet-256x256DiT-XL/2
FID: 2.27
image-generation-on-imagenet-512x512DiT-XL/2
FID: 3.04
Inception score: 240.82

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Scalable Diffusion Models with Transformers | Papers | HyperAI