Command Palette
Search for a command to run...
Zheng Chen Yulun Zhang Jinjin Gu Linghe Kong Xiaokang Yang Fisher Yu

Abstract
Transformer has recently gained considerable popularity in low-level vision tasks, including image super-resolution (SR). These networks utilize self-attention along different dimensions, spatial or channel, and achieve impressive performance. This inspires us to combine the two dimensions in Transformer for a more powerful representation capability. Based on the above idea, we propose a novel Transformer model, Dual Aggregation Transformer (DAT), for image SR. Our DAT aggregates features across spatial and channel dimensions, in the inter-block and intra-block dual manner. Specifically, we alternately apply spatial and channel self-attention in consecutive Transformer blocks. The alternate strategy enables DAT to capture the global context and realize inter-block feature aggregation. Furthermore, we propose the adaptive interaction module (AIM) and the spatial-gate feed-forward network (SGFN) to achieve intra-block feature aggregation. AIM complements two self-attention mechanisms from corresponding dimensions. Meanwhile, SGFN introduces additional non-linear spatial information in the feed-forward network. Extensive experiments show that our DAT surpasses current methods. Code and models are obtainable at https://github.com/zhengchen1999/DAT.
Code Repositories
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| image-super-resolution-on-manga109-4x | DAT+ | PSNR: 32.67 SSIM: 0.9301 |
| image-super-resolution-on-manga109-4x | DAT | PSNR: 32.51 SSIM: 0.9291 |
| image-super-resolution-on-set14-4x-upscaling | DAT+ | PSNR: 29.29 SSIM: 0.7983 |
| image-super-resolution-on-set14-4x-upscaling | DAT | PSNR: 29.23 SSIM: 0.7973 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.