HyperAIHyperAI

Command Palette

Search for a command to run...

5 months ago

Recurrent Video Restoration Transformer with Guided Deformable Attention

Recurrent Video Restoration Transformer with Guided Deformable Attention

Abstract

Video restoration aims at restoring multiple high-quality frames frommultiple low-quality frames. Existing video restoration methods generally fallinto two extreme cases, i.e., they either restore all frames in parallel orrestore the video frame by frame in a recurrent way, which would result indifferent merits and drawbacks. Typically, the former has the advantage oftemporal information fusion. However, it suffers from large model size andintensive memory consumption; the latter has a relatively small model size asit shares parameters across frames; however, it lacks long-range dependencymodeling ability and parallelizability. In this paper, we attempt to integratethe advantages of the two cases by proposing a recurrent video restorationtransformer, namely RVRT. RVRT processes local neighboring frames in parallelwithin a globally recurrent framework which can achieve a good trade-offbetween model size, effectiveness, and efficiency. Specifically, RVRT dividesthe video into multiple clips and uses the previously inferred clip feature toestimate the subsequent clip feature. Within each clip, different framefeatures are jointly updated with implicit feature aggregation. Acrossdifferent clips, the guided deformable attention is designed for clip-to-clipalignment, which predicts multiple relevant locations from the whole inferredclip and aggregates their features by the attention mechanism. Extensiveexperiments on video super-resolution, deblurring, and denoising show that theproposed RVRT achieves state-of-the-art performance on benchmark datasets withbalanced model size, testing memory and runtime.

Code Repositories

Ascend-Research/Turtle
pytorch
Mentioned in GitHub
xg416/DATUM
pytorch
Mentioned in GitHub
labshuhanggu/mia-vsr
pytorch
Mentioned in GitHub
jingyunliang/rvrt
Official
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
analog-video-restoration-on-tapeRVRT
LPIPS: 0.117
PSNR: 32.47
SSIM: 0.896
VMAF: 72.41
deblurring-on-dvd-1RVRT
PSNR: 34.92
SSIM: 97.38
video-denoising-on-davis-sigma10RVRT
PSNR: 40.57
video-denoising-on-davis-sigma20RVRT
PSNR: 38.05
video-denoising-on-davis-sigma30RVRT
PSNR: 36.57
video-denoising-on-davis-sigma40RVRT
PSNR: 35.47
video-denoising-on-davis-sigma50RVRT
PSNR: 34.57
video-denoising-on-set8-sigma10RVRT
PSNR: 37.53
video-denoising-on-set8-sigma20RVRT
PSNR: 34.83
video-denoising-on-set8-sigma30RVRT
PSNR: 33.3
video-denoising-on-set8-sigma40RVRT
PSNR: 32.21
video-denoising-on-set8-sigma50RVRT
PSNR: 31.33
video-deraining-on-vrdsRVRT
PSNR: 28.24
SSIM: 0.8857
video-super-resolution-on-udm10-4x-upscalingRVRT
PSNR: 40.9
SSIM: 0.9729
video-super-resolution-on-vid4-4x-upscalingRVRT
PSNR: 27.99
SSIM: 0.8462
video-super-resolution-on-vid4-4x-upscaling-1RVRT
PSNR: 29.54
SSIM: 0.8810
video-super-resolution-on-vimeo90kRVRT
PSNR: 38.59
SSIM: 0.9576

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Recurrent Video Restoration Transformer with Guided Deformable Attention | Papers | HyperAI