HyperAIHyperAI

Command Palette

Search for a command to run...

5 months ago

Shifted Diffusion for Text-to-image Generation

Yufan Zhou; Bingchen Liu; Yizhe Zhu; Xiao Yang; Changyou Chen; Jinhui Xu

Shifted Diffusion for Text-to-image Generation

Abstract

We present Corgi, a novel method for text-to-image generation. Corgi is based on our proposed shifted diffusion model, which achieves better image embedding generation from input text. Unlike the baseline diffusion model used in DALL-E 2, our method seamlessly encodes prior knowledge of the pre-trained CLIP model in its diffusion process by designing a new initialization distribution and a new transition step of the diffusion. Compared to the strong DALL-E 2 baseline, our method performs better in generating image embedding from the text in terms of both efficiency and effectiveness, resulting in better text-to-image generation. Extensive large-scale experiments are conducted and evaluated in terms of both quantitative measures and human evaluation, indicating a stronger generation ability of our method compared to existing ones. Furthermore, our model enables semi-supervised and language-free training for text-to-image generation, where only part or none of the images in the training dataset have an associated caption. Trained with only 1.7% of the images being captioned, our semi-supervised model obtains FID results comparable to DALL-E 2 on zero-shot text-to-image generation evaluated on MS-COCO. Corgi also achieves new state-of-the-art results across different datasets on downstream language-free text-to-image generation tasks, outperforming the previous method, Lafite, by a large margin.

Code Repositories

drboog/Shifted_Diffusion
Official
jax
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
text-to-image-generation-on-cocoCorgi-Semi
FID: 10.6
text-to-image-generation-on-cocoCorgi
FID: 10.88
text-to-image-generation-on-multi-modalCorgi
FID: 19.74

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Shifted Diffusion for Text-to-image Generation | Papers | HyperAI