Command Palette
Search for a command to run...
Long Zhao Zizhao Zhang Ting Chen Dimitris N. Metaxas Han Zhang

Abstract
Attention-based models, exemplified by the Transformer, can effectively model long range dependency, but suffer from the quadratic complexity of self-attention operation, making them difficult to be adopted for high-resolution image generation based on Generative Adversarial Networks (GANs). In this paper, we introduce two key ingredients to Transformer to address this challenge. First, in low-resolution stages of the generative process, standard global self-attention is replaced with the proposed multi-axis blocked self-attention which allows efficient mixing of local and global attention. Second, in high-resolution stages, we drop self-attention while only keeping multi-layer perceptrons reminiscent of the implicit neural function. To further improve the performance, we introduce an additional self-modulation component based on cross-attention. The resulting model, denoted as HiT, has a nearly linear computational complexity with respect to the image size and thus directly scales to synthesizing high definition images. We show in the experiments that the proposed HiT achieves state-of-the-art FID scores of 30.83 and 2.95 on unconditional ImageNet $128 \times 128$ and FFHQ $256 \times 256$, respectively, with a reasonable throughput. We believe the proposed HiT is an important milestone for generators in GANs which are completely free of convolutions. Our code is made publicly available at https://github.com/google-research/hit-gan
Code Repositories
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| image-generation-on-celeba-256x256 | HiT-B | FID: 3.39 |
| image-generation-on-celeba-hq-1024x1024 | HiT-B | FID: 8.83 |
| image-generation-on-ffhq | HiT-B | FID: 6.37 |
| image-generation-on-ffhq-1024-x-1024 | HiT-B | FID: 6.37 |
| image-generation-on-ffhq-256-x-256 | HiT-S | FID: 3.06 |
| image-generation-on-ffhq-256-x-256 | HiT-L | FID: 2.58 |
| image-generation-on-ffhq-256-x-256 | HiT-B | FID: 2.95 |
| image-generation-on-imagenet-128x128 | HiT | FID: 30.83 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.