HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

CogView2: Faster and Better Text-to-Image Generation via Hierarchical Transformers

Ming Ding Wendi Zheng Wenyi Hong Jie Tang

CogView2: Faster and Better Text-to-Image Generation via Hierarchical Transformers

Abstract

The development of the transformer-based text-to-image models are impeded by its slow generation and complexity for high-resolution images. In this work, we put forward a solution based on hierarchical transformers and local parallel auto-regressive generation. We pretrain a 6B-parameter transformer with a simple and flexible self-supervised task, Cross-modal general language model (CogLM), and finetune it for fast super-resolution. The new text-to-image system, CogView2, shows very competitive generation compared to concurrent state-of-the-art DALL-E-2, and naturally supports interactive text-guided editing on images.

Code Repositories

thudm/cogview2
Official
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
text-to-image-generation-on-cocoCogView2(6B, Finetuned)
FID: 17.7
text-to-image-generation-on-cocoCogView2(6B, Finetuned)
FID: 24

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
CogView2: Faster and Better Text-to-Image Generation via Hierarchical Transformers | Papers | HyperAI