HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

Generative Adversarial Transformers

Drew A. Hudson C. Lawrence Zitnick

Generative Adversarial Transformers

Abstract

We introduce the GANformer, a novel and efficient type of transformer, and explore it for the task of visual generative modeling. The network employs a bipartite structure that enables long-range interactions across the image, while maintaining computation of linear efficiency, that can readily scale to high-resolution synthesis. It iteratively propagates information from a set of latent variables to the evolving visual features and vice versa, to support the refinement of each in light of the other and encourage the emergence of compositional representations of objects and scenes. In contrast to the classic transformer architecture, it utilizes multiplicative integration that allows flexible region-based modulation, and can thus be seen as a generalization of the successful StyleGAN network. We demonstrate the model's strength and robustness through a careful evaluation over a range of datasets, from simulated multi-object environments to rich real-world indoor and outdoor scenes, showing it achieves state-of-the-art results in terms of image quality and diversity, while enjoying fast learning and better data-efficiency. Further qualitative and quantitative experiments offer us an insight into the model's inner workings, revealing improved interpretability and stronger disentanglement, and illustrating the benefits and efficacy of our approach. An implementation of the model is available at https://github.com/dorarad/gansformer.

Code Repositories

dorarad/gansformer
Official
tf
Mentioned in GitHub
lucidrains/transganformer
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
image-generation-on-cityscapesGAN
FID-10k-training-steps: 11.5652
image-generation-on-cityscapesStyleGAN2
FID-10k-training-steps: 8.35
image-generation-on-cityscapesGANformer
FID-10k-training-steps: 5.7589
image-generation-on-cityscapesSAGAN
FID-10k-training-steps: 12.8077
image-generation-on-cityscapesVQGAN
FID-10k-training-steps: 173.7971
image-generation-on-clevrVQGAN
FID-5k-training-steps: 32.6031
image-generation-on-clevrGAN
FID-5k-training-steps: 25.0244
image-generation-on-clevrSAGAN
FID-5k-training-steps: 26.0433
image-generation-on-clevrStyleGAN2
FID-5k-training-steps: 16.0534
image-generation-on-clevrGANformer
FID-5k-training-steps: 9.1679
image-generation-on-ffhqGAN
FID-10k-training-steps: 13.1844
image-generation-on-ffhqSAGAN
FID-10k-training-steps: 16.2069
image-generation-on-ffhqStyleGAN2
Clean-FID (70k): 2.98
FID-10k-training-steps: 10.8309
image-generation-on-ffhqVQGAN
FID-10k-training-steps: 63.1165
image-generation-on-ffhqGANsformer
FID-10k-training-steps: 12.8478
image-generation-on-ffhq-256-x-256GANFormer
FID: 7.42
image-generation-on-lsun-bedroom-256-x-256SAGAN
FID-10k-training-steps: 14.0595
image-generation-on-lsun-bedroom-256-x-256StyleGAN2
FID-10k-training-steps: 11.5255
image-generation-on-lsun-bedroom-256-x-256VQGAN
FID-10k-training-steps: 59.6333
image-generation-on-lsun-bedroom-256-x-256GAN
FID-10k-training-steps: 12.1567
image-generation-on-lsun-bedroom-256-x-256GANformer
FID-10k-training-steps: 6.5085

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Generative Adversarial Transformers | Papers | HyperAI