5 months ago

Recurrent Video Restoration Transformer with Guided Deformable Attention

Liang Jingyun ; Fan Yuchen ; Xiang Xiaoyu ; Ranjan Rakesh ; Ilg Eddy ; Green Simon ; Cao Jiezhang ; Zhang Kai ; Timofte Radu ; Van Gool

Abstract

Video restoration aims at restoring multiple high-quality frames frommultiple low-quality frames. Existing video restoration methods generally fallinto two extreme cases, i.e., they either restore all frames in parallel orrestore the video frame by frame in a recurrent way, which would result indifferent merits and drawbacks. Typically, the former has the advantage oftemporal information fusion. However, it suffers from large model size andintensive memory consumption; the latter has a relatively small model size asit shares parameters across frames; however, it lacks long-range dependencymodeling ability and parallelizability. In this paper, we attempt to integratethe advantages of the two cases by proposing a recurrent video restorationtransformer, namely RVRT. RVRT processes local neighboring frames in parallelwithin a globally recurrent framework which can achieve a good trade-offbetween model size, effectiveness, and efficiency. Specifically, RVRT dividesthe video into multiple clips and uses the previously inferred clip feature toestimate the subsequent clip feature. Within each clip, different framefeatures are jointly updated with implicit feature aggregation. Acrossdifferent clips, the guided deformable attention is designed for clip-to-clipalignment, which predicts multiple relevant locations from the whole inferredclip and aggregates their features by the attention mechanism. Extensiveexperiments on video super-resolution, deblurring, and denoising show that theproposed RVRT achieves state-of-the-art performance on benchmark datasets withbalanced model size, testing memory and runtime.

Code Repositories

Ascend-Research/Turtle

pytorch

Mentioned in GitHub

xg416/DATUM

pytorch

Mentioned in GitHub

labshuhanggu/mia-vsr

pytorch

Mentioned in GitHub

jingyunliang/rvrt

Official

pytorch

Mentioned in GitHub

Benchmarks

Benchmark	Methodology	Metrics
analog-video-restoration-on-tape	RVRT	LPIPS: 0.117 PSNR: 32.47 SSIM: 0.896 VMAF: 72.41
deblurring-on-dvd-1	RVRT	PSNR: 34.92 SSIM: 97.38
video-denoising-on-davis-sigma10	RVRT	PSNR: 40.57
video-denoising-on-davis-sigma20	RVRT	PSNR: 38.05
video-denoising-on-davis-sigma30	RVRT	PSNR: 36.57
video-denoising-on-davis-sigma40	RVRT	PSNR: 35.47
video-denoising-on-davis-sigma50	RVRT	PSNR: 34.57
video-denoising-on-set8-sigma10	RVRT	PSNR: 37.53
video-denoising-on-set8-sigma20	RVRT	PSNR: 34.83
video-denoising-on-set8-sigma30	RVRT	PSNR: 33.3
video-denoising-on-set8-sigma40	RVRT	PSNR: 32.21
video-denoising-on-set8-sigma50	RVRT	PSNR: 31.33
video-deraining-on-vrds	RVRT	PSNR: 28.24 SSIM: 0.8857
video-super-resolution-on-udm10-4x-upscaling	RVRT	PSNR: 40.9 SSIM: 0.9729
video-super-resolution-on-vid4-4x-upscaling	RVRT	PSNR: 27.99 SSIM: 0.8462
video-super-resolution-on-vid4-4x-upscaling-1	RVRT	PSNR: 29.54 SSIM: 0.8810
video-super-resolution-on-vimeo90k	RVRT	PSNR: 38.59 SSIM: 0.9576

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started

Hyper Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Command Palette

Recurrent Video Restoration Transformer with Guided Deformable Attention

Liang Jingyun ; Fan Yuchen ; Xiang Xiaoyu ; Ranjan Rakesh ; Ilg Eddy ; Green Simon ; Cao Jiezhang ; Zhang Kai ; Timofte Radu ; Van Gool1 more

Abstract

Code Repositories

Benchmarks

Build AI with AI

Hyper Newsletters

Liang Jingyun ; Fan Yuchen ; Xiang Xiaoyu ; Ranjan Rakesh ; Ilg Eddy ; Green Simon ; Cao Jiezhang ; Zhang Kai ; Timofte Radu ; Van Gool