
摘要
我们提出了一种新型且高效的Transformer架构——GANformer,并将其应用于视觉生成建模任务。该网络采用二分图结构,能够在图像中实现长距离交互,同时保持线性复杂度的计算效率,从而可轻松扩展至高分辨率图像生成。GANformer通过在一组潜在变量与动态演化视觉特征之间迭代传播信息,相互促进彼此的优化,从而支持对象与场景的组合性表征的自发形成。与经典Transformer架构不同,GANformer采用乘法融合机制,能够实现灵活的区域化调制,因此可被视为成功模型StyleGAN的推广与拓展。我们在多种数据集上进行了严谨评估,涵盖模拟的多物体环境以及丰富的真实世界室内与室外场景,结果表明,该模型在图像质量与多样性方面均达到当前最优水平,同时具备快速学习能力与更高的数据利用效率。进一步的定性与定量实验深入揭示了模型内部工作机制,显示出更强的可解释性与更优的表征解耦能力,充分验证了本方法的优势与有效性。模型的开源实现已发布于:https://github.com/dorarad/gansformer。
代码仓库
dorarad/gansformer
官方
tf
GitHub 中提及
lucidrains/transganformer
pytorch
GitHub 中提及
基准测试
| 基准 | 方法 | 指标 |
|---|---|---|
| image-generation-on-cityscapes | GAN | FID-10k-training-steps: 11.5652 |
| image-generation-on-cityscapes | StyleGAN2 | FID-10k-training-steps: 8.35 |
| image-generation-on-cityscapes | GANformer | FID-10k-training-steps: 5.7589 |
| image-generation-on-cityscapes | SAGAN | FID-10k-training-steps: 12.8077 |
| image-generation-on-cityscapes | VQGAN | FID-10k-training-steps: 173.7971 |
| image-generation-on-clevr | VQGAN | FID-5k-training-steps: 32.6031 |
| image-generation-on-clevr | GAN | FID-5k-training-steps: 25.0244 |
| image-generation-on-clevr | SAGAN | FID-5k-training-steps: 26.0433 |
| image-generation-on-clevr | StyleGAN2 | FID-5k-training-steps: 16.0534 |
| image-generation-on-clevr | GANformer | FID-5k-training-steps: 9.1679 |
| image-generation-on-ffhq | GAN | FID-10k-training-steps: 13.1844 |
| image-generation-on-ffhq | SAGAN | FID-10k-training-steps: 16.2069 |
| image-generation-on-ffhq | StyleGAN2 | Clean-FID (70k): 2.98 FID-10k-training-steps: 10.8309 |
| image-generation-on-ffhq | VQGAN | FID-10k-training-steps: 63.1165 |
| image-generation-on-ffhq | GANsformer | FID-10k-training-steps: 12.8478 |
| image-generation-on-ffhq-256-x-256 | GANFormer | FID: 7.42 |
| image-generation-on-lsun-bedroom-256-x-256 | SAGAN | FID-10k-training-steps: 14.0595 |
| image-generation-on-lsun-bedroom-256-x-256 | StyleGAN2 | FID-10k-training-steps: 11.5255 |
| image-generation-on-lsun-bedroom-256-x-256 | VQGAN | FID-10k-training-steps: 59.6333 |
| image-generation-on-lsun-bedroom-256-x-256 | GAN | FID-10k-training-steps: 12.1567 |
| image-generation-on-lsun-bedroom-256-x-256 | GANformer | FID-10k-training-steps: 6.5085 |