
摘要
图像恢复是一个长期存在的低层次视觉问题,旨在从低质量图像(例如,缩小比例、噪声和压缩图像)中恢复高质量图像。尽管当前最先进的图像恢复方法主要基于卷积神经网络,但很少有人尝试使用在高层次视觉任务中表现出色的Transformer模型。本文提出了一种基于Swin Transformer的强大的基线模型——SwinIR,用于图像恢复。SwinIR由三部分组成:浅层特征提取、深层特征提取和高质量图像重建。特别是,深层特征提取模块由多个残差Swin Transformer块(RSTB)构成,每个RSTB包含若干个Swin Transformer层以及一个残差连接。我们在三个代表性任务上进行了实验:图像超分辨率(包括经典、轻量级和真实世界图像超分辨率)、图像去噪(包括灰度和彩色图像去噪)以及JPEG压缩伪影减少。实验结果表明,SwinIR在不同任务上的性能优于现有最先进方法高达0.14~0.45 dB,同时其参数总量可以减少多达67%。
代码仓库
XPixelGroup/BasicSR
pytorch
mv-lab/swin2sr
pytorch
GitHub 中提及
rami0205/ngramswin
pytorch
GitHub 中提及
ayanglab/swinmr
pytorch
GitHub 中提及
pilot7747/sldl
pytorch
skchen1993/SwinIR
pytorch
GitHub 中提及
jingyunliang/vrt
pytorch
GitHub 中提及
jingyunliang/swinir
官方
pytorch
GitHub 中提及
ayanglab/swinganmr
pytorch
GitHub 中提及
基准测试
| 基准 | 方法 | 指标 |
|---|---|---|
| color-image-denoising-on-kodak24-sigma50 | SwinIR | PSNR: 29.79 |
| color-image-denoising-on-urban100-sigma10 | SwinIR | PSNR: 35.13 |
| color-image-denoising-on-urban100-sigma15-1 | SwinIR | Average PSNR: 35.13 |
| color-image-denoising-on-urban100-sigma25 | SwinIR | PSNR: 32.9 |
| color-image-denoising-on-urban100-sigma50 | SwinIR | PSNR: 29.82 |
| grayscale-image-denoising-on-bsd68-sigma15 | SwinIR | PSNR: 31.97 |
| grayscale-image-denoising-on-urban100-sigma15 | SwinIR | PSNR: 33.70 |
| grayscale-image-denoising-on-urban100-sigma25 | SwinIR | PSNR: 31.3 |
| grayscale-image-denoising-on-urban100-sigma50 | SwinIR | PSNR: 27.98 |
| image-super-resolution-on-manga109-4x | SwinIR | PSNR: 32.22 SSIM: 0.9273 |
| image-super-resolution-on-set14-4x-upscaling | SwinIR | PSNR: 29.15 SSIM: 0.7958 |
| image-super-resolution-on-urban100-4x | SwinIR | PSNR: 27.45 SSIM: 0.8254 |
| video-super-resolution-on-msu-super-1 | SwinIR + vvenc | BSQ-rate over ERQA: 6.624 BSQ-rate over LPIPS: 1.552 BSQ-rate over MS-SSIM: 5.758 BSQ-rate over PSNR: 8.971 BSQ-rate over Subjective Score: 1.35 BSQ-rate over VMAF: 0.887 |
| video-super-resolution-on-msu-super-1 | SwinIR + aomenc | BSQ-rate over ERQA: 10.854 BSQ-rate over LPIPS: 4.566 BSQ-rate over MS-SSIM: 7.105 BSQ-rate over PSNR: 15.144 BSQ-rate over Subjective Score: 0.835 BSQ-rate over VMAF: 3.32 |
| video-super-resolution-on-msu-super-1 | SwinIR + uavs3e | BSQ-rate over ERQA: 6.803 BSQ-rate over LPIPS: 1.671 BSQ-rate over MS-SSIM: 4.411 BSQ-rate over PSNR: 15.144 BSQ-rate over Subjective Score: 0.639 BSQ-rate over VMAF: 1.848 |
| video-super-resolution-on-msu-super-1 | SwinIR + x265 | BSQ-rate over ERQA: 1.575 BSQ-rate over LPIPS: 1.474 BSQ-rate over MS-SSIM: 4.641 BSQ-rate over PSNR: 8.13 BSQ-rate over Subjective Score: 0.346 BSQ-rate over VMAF: 1.304 |
| video-super-resolution-on-msu-super-1 | SwinIR + x264 | BSQ-rate over ERQA: 0.76 BSQ-rate over LPIPS: 0.559 BSQ-rate over MS-SSIM: 0.736 BSQ-rate over PSNR: 6.268 BSQ-rate over Subjective Score: 0.304 BSQ-rate over VMAF: 0.642 |
| video-super-resolution-on-msu-video-upscalers | SwinIR-Real-B | LPIPS: 0.183 PSNR: 28.86 SSIM: 0.830 |
| video-super-resolution-on-msu-video-upscalers | SwinIR-Real-S | LPIPS: 0.189 PSNR: 28.55 SSIM: 0.845 |
| video-super-resolution-on-msu-vsr-benchmark | SwinIR | 1 - LPIPS: 0.895 ERQAv1.0: 0.618 FPS: 0.407 PSNR: 25.12 QRCRv1.0: 0 SSIM: 0.782 Subjective score: 4.799 |