HyperAIHyperAI

Command Palette

Search for a command to run...

5 months ago

Learning Joint Spatial-Temporal Transformations for Video Inpainting

Yanhong Zeng; Jianlong Fu; Hongyang Chao

Learning Joint Spatial-Temporal Transformations for Video Inpainting

Abstract

High-quality video inpainting that completes missing regions in video frames is a promising yet challenging task. State-of-the-art approaches adopt attention models to complete a frame by searching missing contents from reference frames, and further complete whole videos frame by frame. However, these approaches can suffer from inconsistent attention results along spatial and temporal dimensions, which often leads to blurriness and temporal artifacts in videos. In this paper, we propose to learn a joint Spatial-Temporal Transformer Network (STTN) for video inpainting. Specifically, we simultaneously fill missing regions in all input frames by self-attention, and propose to optimize STTN by a spatial-temporal adversarial loss. To show the superiority of the proposed model, we conduct both quantitative and qualitative evaluations by using standard stationary masks and more realistic moving object masks. Demo videos are available at https://github.com/researchmm/STTN.

Code Repositories

Feynman1999/MgeEditing
Mentioned in GitHub
researchmm/STTN
Official
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
seeing-beyond-the-visible-on-kitti360-exSTTN
Average PSNR: 18.73
video-inpainting-on-davisSTTN
Ewarp: 0.1449
PSNR: 30.67
SSIM: 0.9560
VFID: 0.149
video-inpainting-on-hqvi-240pSTTN
LPIPS: 0.0528
PSNR: 29.64
SSIM: 0.9339
VFID: 0.2594
video-inpainting-on-youtube-vosSTTN
Ewarp: 0.0907
PSNR: 32.34
SSIM: 0.9655
VFID: 0.053

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Learning Joint Spatial-Temporal Transformations for Video Inpainting | Papers | HyperAI