HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

Latent Video Diffusion Models for High-Fidelity Long Video Generation

Yingqing He Tianyu Yang Yong Zhang Ying Shan Qifeng Chen

Latent Video Diffusion Models for High-Fidelity Long Video Generation

Abstract

AI-generated content has attracted lots of attention recently, but photo-realistic video synthesis is still challenging. Although many attempts using GANs and autoregressive models have been made in this area, the visual quality and length of generated videos are far from satisfactory. Diffusion models have shown remarkable results recently but require significant computational resources. To address this, we introduce lightweight video diffusion models by leveraging a low-dimensional 3D latent space, significantly outperforming previous pixel-space video diffusion models under a limited computational budget. In addition, we propose hierarchical diffusion in the latent space such that longer videos with more than one thousand frames can be produced. To further overcome the performance degradation issue for long video generation, we propose conditional latent perturbation and unconditional guidance that effectively mitigate the accumulated errors during the extension of video length. Extensive experiments on small domain datasets of different categories suggest that our framework generates more realistic and longer videos than previous strong baselines. We additionally provide an extension to large-scale text-to-video generation to demonstrate the superiority of our work. Our code and models will be made publicly available.

Code Repositories

yingqinghe/lvdm
Official
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
video-generation-on-sky-time-lapseTATS (128x128)
FVD 16: 132.6
KVD16: 5.7
video-generation-on-sky-time-lapseLong-video GAN (128x128)
FVD 16: 107.5
video-generation-on-sky-time-lapseMoCoGAN-HD (128x128)
FVD 16: 183.6
KVD16: 13.9
video-generation-on-sky-time-lapseLong-video GAN (256x256)
FVD 16: 116.5
video-generation-on-sky-time-lapseDIGAN (128x128)
FVD 16: 114.6
KVD16: 6.8
video-generation-on-sky-time-lapseLVDM (256x256)
FVD 16: 95.2
KVD16: 3.9
video-generation-on-taichiDIGAN (256x256)
FVD16: 156.7
video-generation-on-taichiLVDM (256x256)
FVD16: 99
KVD16: 15.3
video-generation-on-taichiTATS (128x128)
FVD16: 94.6
KVD16: 9.8
video-generation-on-taichiMoCoGAN-HD (128x128)
FVD16: 144.7
KVD16: 25.4
video-generation-on-taichiDIGAN (128x128)
FVD16: 128.1
KVD16: 20.6
video-generation-on-ucf-101LVDM (256x256, unconditional)
FVD16: 552
KVD16: 42
video-generation-on-ucf-101VDM
FVD16: 1396
KVD16: 116
video-generation-on-ucf-101MCVD
FVD16: 2460
KVD16: 148
video-generation-on-ucf-101TGAN-v2 (128x128)
FVD16: 1209
video-generation-on-ucf-101LVDM (256x256, unconditional)
FVD16: 372
KVD16: 27

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Latent Video Diffusion Models for High-Fidelity Long Video Generation | Papers | HyperAI