摘要
视频超分辨率技术的目标在于有效从低分辨率(LR)视频中恢复出高分辨率(HR)视频。以往的方法通常利用光流进行帧对齐,并从时空两个维度设计网络框架。然而,光流估计容易出现误差,从而导致重建效果下降。此外,如何高效融合多帧视频特征仍是一个具有挑战性的问题。本文提出一种新颖的局部-全局融合网络(Local-Global Fusion Network, LGFN),以解决上述问题。不同于传统依赖光流的方法,本工作采用可变形卷积(Deformable Convolutions, DCs)并结合减少的多膨胀卷积单元(Decreased Multi-Dilation Convolution Units, DMDCUs),实现高效且隐式的帧对齐。此外,我们设计了一种双分支结构,包含局部融合模块(Local Fusion Module, LFM)与全局融合模块(Global Fusion Module, GFM),从不同角度融合视频信息:LFM关注相邻帧之间的关系,以保持时间一致性;GFM则通过视频打乱策略(video shuffle strategy),全局地利用所有相关特征,提升信息整合能力。得益于所提出的先进网络架构,我们在多个数据集上的实验结果表明,LGFN不仅在性能上达到与当前最先进方法相当的水平,还展现出对多种视频帧的可靠恢复能力。LGFN在基准数据集上的实验结果已发布于 https://github.com/BIOINSu/LGFN,相关源代码将在论文被接收后立即公开。
基准测试
| 基准 | 方法 | 指标 |
|---|---|---|
| video-super-resolution-on-msu-super-1 | LGFN + aomenc | BSQ-rate over ERQA: 14.631 BSQ-rate over LPIPS: 5.536 BSQ-rate over MS-SSIM: 4.321 BSQ-rate over PSNR: 9.79 BSQ-rate over VMAF: 1.99 |
| video-super-resolution-on-msu-super-1 | LGFN + x264 | BSQ-rate over ERQA: 1.704 BSQ-rate over LPIPS: 1.324 BSQ-rate over MS-SSIM: 0.77 BSQ-rate over PSNR: 1.151 BSQ-rate over VMAF: 0.744 |
| video-super-resolution-on-msu-super-1 | LGFN + vvenc | BSQ-rate over ERQA: 18.342 BSQ-rate over LPIPS: 11.759 BSQ-rate over MS-SSIM: 0.889 BSQ-rate over PSNR: 5.768 BSQ-rate over Subjective Score: 2.944 BSQ-rate over VMAF: 1.626 |
| video-super-resolution-on-msu-super-1 | LGFN + x265 | BSQ-rate over ERQA: 13.213 BSQ-rate over LPIPS: 11.399 BSQ-rate over MS-SSIM: 1.533 BSQ-rate over PSNR: 6.646 BSQ-rate over VMAF: 1.341 |
| video-super-resolution-on-msu-super-1 | LGFN + uavs3e | BSQ-rate over ERQA: 9.279 BSQ-rate over LPIPS: 4.504 BSQ-rate over MS-SSIM: 2.427 BSQ-rate over PSNR: 5.503 BSQ-rate over VMAF: 1.625 |
| video-super-resolution-on-msu-video-upscalers | LGFN | PSNR: 27.42 SSIM: 0.939 VMAF: 57.79 |
| video-super-resolution-on-msu-vsr-benchmark | LGFN | 1 - LPIPS: 0.903 ERQAv1.0: 0.74 FPS: 0.667 PSNR: 31.291 QRCRv1.0: 0.629 SSIM: 0.898 Subjective score: 6.505 |