4 个月前

VRT:一种视频修复变压器

VRT:一种视频修复变压器

摘要

视频修复(例如,视频超分辨率)旨在从低质量帧中恢复高质量帧。与单图像修复不同,视频修复通常需要利用多个相邻但通常未对齐的视频帧中的时间信息。现有的深度学习方法通常通过滑动窗口策略或递归架构来解决这一问题,但前者受限于逐帧修复,后者则缺乏长距离建模能力。在本文中,我们提出了一种具有并行帧预测和长距离时间依赖建模能力的视频修复变压器(Video Restoration Transformer, VRT)。具体而言,VRT由多个尺度组成,每个尺度包含两种模块:时间互自注意力(Temporal Mutual Self Attention, TMSA)和平行变形(Parallel Warping)。TMSA将视频划分为小片段,在这些片段上应用互注意力进行联合运动估计、特征对齐和特征融合,而自注意力则用于特征提取。为了实现跨片段的交互,每两层之间会移动视频序列。此外,平行变形通过并行特征变形进一步融合邻近帧的信息。实验结果表明,在包括视频超分辨率、视频去模糊、视频降噪、视频帧插值和时空视频超分辨率在内的五项任务中,VRT在十四种基准数据集上的表现显著优于现有最先进方法(最高可达2.16 dB)。

代码仓库

jingyunliang/vrt
官方
pytorch
GitHub 中提及

基准测试

基准方法指标
deblurring-on-basedVRT (GoPro)
ERQAv2.0: 0.74874
LPIPS: 0.08165
PSNR: 31.42945
SSIM: 0.94503
Subjective: 2.3854
VMAF: 66.72253
deblurring-on-basedVRT (REDS)
ERQAv2.0: 0.75056
LPIPS: 0.08248
PSNR: 30.97878
SSIM: 0.94601
Subjective: 1.5660
VMAF: 66.81782
deblurring-on-based-1VRT (GoPro)
PSNR: 31.42945
VMAF: 66.72253
deblurring-on-based-1VRT (REDS)
ERQAv2.0: 0.74874
LPIPS: 0.08248
PSNR: 30.97878
SSIM: 0.94503
VMAF: 66.81782
deblurring-on-dvd-1VRT
PSNR: 34.27
deblurring-on-goproVRT
PSNR: 34.81
SSIM: 0.9724
deblurring-on-redsVRT
Average PSNR: 36.79
space-time-video-super-resolution-on-vimeo90kVRT
PSNR: 36.98
SSIM: 0.9439
space-time-video-super-resolution-on-vimeo90k-1VRT
PSNR: 36.01
SSIM: 0.9434
video-denoising-on-davis-sigma10VRT
PSNR: 40.82
video-denoising-on-davis-sigma20VRT
PSNR: 38.15
video-denoising-on-davis-sigma30VRT
PSNR: 36.52
video-denoising-on-davis-sigma40VRT
PSNR: 35.32
video-denoising-on-davis-sigma50VRT
PSNR: 34.36
video-denoising-on-set8-sigma10VRT
PSNR: 37.88
video-denoising-on-set8-sigma20VRT
PSNR: 35.02
video-denoising-on-set8-sigma30VRT
PSNR: 33.35
video-denoising-on-set8-sigma40VRT
PSNR: 32.15
video-denoising-on-set8-sigma50VRT
PSNR: 31.22
video-frame-interpolation-on-vid4-4xVRT
PSNR: 27.46
Parameters: 4450000
SSIM: 0.8392
video-super-resolution-on-msu-super-1VRT + uavs3e
BSQ-rate over ERQA: 6.619
BSQ-rate over LPIPS: 4.003
BSQ-rate over MS-SSIM: 1.982
BSQ-rate over PSNR: 5.862
BSQ-rate over Subjective Score: 2.511
BSQ-rate over VMAF: 1.425
video-super-resolution-on-msu-super-1VRT + aomenc
BSQ-rate over ERQA: 12.289
BSQ-rate over LPIPS: 4.429
BSQ-rate over MS-SSIM: 2.797
BSQ-rate over PSNR: 10.075
BSQ-rate over Subjective Score: 2.631
BSQ-rate over VMAF: 1.733
video-super-resolution-on-msu-super-1VRT + vvenc
BSQ-rate over ERQA: 18.333
BSQ-rate over LPIPS: 11.496
BSQ-rate over MS-SSIM: 0.836
BSQ-rate over PSNR: 5.777
BSQ-rate over Subjective Score: 2.235
BSQ-rate over VMAF: 0.652
video-super-resolution-on-msu-super-1VRT + x265
BSQ-rate over ERQA: 8.92
BSQ-rate over LPIPS: 11.329
BSQ-rate over MS-SSIM: 1.257
BSQ-rate over PSNR: 6.634
BSQ-rate over Subjective Score: 2.023
BSQ-rate over VMAF: 1.217
video-super-resolution-on-msu-super-1VRT + x264
BSQ-rate over ERQA: 1.578
BSQ-rate over LPIPS: 1.259
BSQ-rate over MS-SSIM: 0.662
BSQ-rate over PSNR: 1.09
BSQ-rate over Subjective Score: 1.245
BSQ-rate over VMAF: 0.7
video-super-resolution-on-msu-video-upscalersVRT-Reds-L
LPIPS: 0.343
PSNR: 31.01
SSIM: 0.869
video-super-resolution-on-msu-vsr-benchmarkVRT
1 - LPIPS: 0.929
ERQAv1.0: 0.758
FPS: 2.778
PSNR: 31.669
QRCRv1.0: 0.722
SSIM: 0.902
Subjective score: 7.628
video-super-resolution-on-udm10-4x-upscalingVRT
PSNR: 41.05
SSIM: 0.9737
video-super-resolution-on-vid4-4x-upscalingVRT
PSNR: 27.93
SSIM: 0.8425
video-super-resolution-on-vid4-4x-upscaling-1VRT
PSNR: 29.42
SSIM: 0.8795

用 AI 构建 AI

从想法到上线——通过免费 AI 协同编程、开箱即用的环境和市场最优价格的 GPU 加速您的 AI 开发

AI 协同编程
即用型 GPU
最优价格
立即开始

Hyper Newsletters

订阅我们的最新资讯
我们会在北京时间 每周一的上午九点 向您的邮箱投递本周内的最新更新
邮件发送服务由 MailChimp 提供
VRT:一种视频修复变压器 | 论文 | HyperAI超神经