HyperAIHyperAI

Command Palette

Search for a command to run...

5 months ago

VRT: A Video Restoration Transformer

Jingyun Liang; Jiezhang Cao; Yuchen Fan; Kai Zhang; Rakesh Ranjan; Yawei Li; Radu Timofte; Luc Van Gool

VRT: A Video Restoration Transformer

Abstract

Video restoration (e.g., video super-resolution) aims to restore high-quality frames from low-quality frames. Different from single image restoration, video restoration generally requires to utilize temporal information from multiple adjacent but usually misaligned video frames. Existing deep methods generally tackle with this by exploiting a sliding window strategy or a recurrent architecture, which either is restricted by frame-by-frame restoration or lacks long-range modelling ability. In this paper, we propose a Video Restoration Transformer (VRT) with parallel frame prediction and long-range temporal dependency modelling abilities. More specifically, VRT is composed of multiple scales, each of which consists of two kinds of modules: temporal mutual self attention (TMSA) and parallel warping. TMSA divides the video into small clips, on which mutual attention is applied for joint motion estimation, feature alignment and feature fusion, while self attention is used for feature extraction. To enable cross-clip interactions, the video sequence is shifted for every other layer. Besides, parallel warping is used to further fuse information from neighboring frames by parallel feature warping. Experimental results on five tasks, including video super-resolution, video deblurring, video denoising, video frame interpolation and space-time video super-resolution, demonstrate that VRT outperforms the state-of-the-art methods by large margins ($\textbf{up to 2.16dB}$) on fourteen benchmark datasets.

Code Repositories

jingyunliang/vrt
Official
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
deblurring-on-basedVRT (GoPro)
ERQAv2.0: 0.74874
LPIPS: 0.08165
PSNR: 31.42945
SSIM: 0.94503
Subjective: 2.3854
VMAF: 66.72253
deblurring-on-basedVRT (REDS)
ERQAv2.0: 0.75056
LPIPS: 0.08248
PSNR: 30.97878
SSIM: 0.94601
Subjective: 1.5660
VMAF: 66.81782
deblurring-on-based-1VRT (GoPro)
PSNR: 31.42945
VMAF: 66.72253
deblurring-on-based-1VRT (REDS)
ERQAv2.0: 0.74874
LPIPS: 0.08248
PSNR: 30.97878
SSIM: 0.94503
VMAF: 66.81782
deblurring-on-dvd-1VRT
PSNR: 34.27
deblurring-on-goproVRT
PSNR: 34.81
SSIM: 0.9724
deblurring-on-redsVRT
Average PSNR: 36.79
space-time-video-super-resolution-on-vimeo90kVRT
PSNR: 36.98
SSIM: 0.9439
space-time-video-super-resolution-on-vimeo90k-1VRT
PSNR: 36.01
SSIM: 0.9434
video-denoising-on-davis-sigma10VRT
PSNR: 40.82
video-denoising-on-davis-sigma20VRT
PSNR: 38.15
video-denoising-on-davis-sigma30VRT
PSNR: 36.52
video-denoising-on-davis-sigma40VRT
PSNR: 35.32
video-denoising-on-davis-sigma50VRT
PSNR: 34.36
video-denoising-on-set8-sigma10VRT
PSNR: 37.88
video-denoising-on-set8-sigma20VRT
PSNR: 35.02
video-denoising-on-set8-sigma30VRT
PSNR: 33.35
video-denoising-on-set8-sigma40VRT
PSNR: 32.15
video-denoising-on-set8-sigma50VRT
PSNR: 31.22
video-frame-interpolation-on-vid4-4xVRT
PSNR: 27.46
Parameters: 4450000
SSIM: 0.8392
video-super-resolution-on-msu-super-1VRT + uavs3e
BSQ-rate over ERQA: 6.619
BSQ-rate over LPIPS: 4.003
BSQ-rate over MS-SSIM: 1.982
BSQ-rate over PSNR: 5.862
BSQ-rate over Subjective Score: 2.511
BSQ-rate over VMAF: 1.425
video-super-resolution-on-msu-super-1VRT + aomenc
BSQ-rate over ERQA: 12.289
BSQ-rate over LPIPS: 4.429
BSQ-rate over MS-SSIM: 2.797
BSQ-rate over PSNR: 10.075
BSQ-rate over Subjective Score: 2.631
BSQ-rate over VMAF: 1.733
video-super-resolution-on-msu-super-1VRT + vvenc
BSQ-rate over ERQA: 18.333
BSQ-rate over LPIPS: 11.496
BSQ-rate over MS-SSIM: 0.836
BSQ-rate over PSNR: 5.777
BSQ-rate over Subjective Score: 2.235
BSQ-rate over VMAF: 0.652
video-super-resolution-on-msu-super-1VRT + x265
BSQ-rate over ERQA: 8.92
BSQ-rate over LPIPS: 11.329
BSQ-rate over MS-SSIM: 1.257
BSQ-rate over PSNR: 6.634
BSQ-rate over Subjective Score: 2.023
BSQ-rate over VMAF: 1.217
video-super-resolution-on-msu-super-1VRT + x264
BSQ-rate over ERQA: 1.578
BSQ-rate over LPIPS: 1.259
BSQ-rate over MS-SSIM: 0.662
BSQ-rate over PSNR: 1.09
BSQ-rate over Subjective Score: 1.245
BSQ-rate over VMAF: 0.7
video-super-resolution-on-msu-video-upscalersVRT-Reds-L
LPIPS: 0.343
PSNR: 31.01
SSIM: 0.869
video-super-resolution-on-msu-vsr-benchmarkVRT
1 - LPIPS: 0.929
ERQAv1.0: 0.758
FPS: 2.778
PSNR: 31.669
QRCRv1.0: 0.722
SSIM: 0.902
Subjective score: 7.628
video-super-resolution-on-udm10-4x-upscalingVRT
PSNR: 41.05
SSIM: 0.9737
video-super-resolution-on-vid4-4x-upscalingVRT
PSNR: 27.93
SSIM: 0.8425
video-super-resolution-on-vid4-4x-upscaling-1VRT
PSNR: 29.42
SSIM: 0.8795

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
VRT: A Video Restoration Transformer | Papers | HyperAI