HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

Preserve Your Own Correlation: A Noise Prior for Video Diffusion Models

Songwei Ge Seungjun Nah Guilin Liu Tyler Poon Andrew Tao Bryan Catanzaro David Jacobs Jia-Bin Huang Ming-Yu Liu Yogesh Balaji

Preserve Your Own Correlation: A Noise Prior for Video Diffusion Models

Abstract

Despite tremendous progress in generating high-quality images using diffusion models, synthesizing a sequence of animated frames that are both photorealistic and temporally coherent is still in its infancy. While off-the-shelf billion-scale datasets for image generation are available, collecting similar video data of the same scale is still challenging. Also, training a video diffusion model is computationally much more expensive than its image counterpart. In this work, we explore finetuning a pretrained image diffusion model with video data as a practical solution for the video synthesis task. We find that naively extending the image noise prior to video noise prior in video diffusion leads to sub-optimal performance. Our carefully designed video noise prior leads to substantially better performance. Extensive experimental validation shows that our model, Preserve Your Own Correlation (PYoCo), attains SOTA zero-shot text-to-video results on the UCF-101 and MSR-VTT benchmarks. It also achieves SOTA video generation quality on the small-scale UCF-101 benchmark with a $10\times$ smaller model using significantly less computation than the prior art.

Benchmarks

BenchmarkMethodologyMetrics
text-to-video-generation-on-ucf-101PYoCo (Zero-shot, 64x64)
FVD16: 355.19
video-generation-on-ucf-101PYoCo (Zero-shot, 64x64, text-conditional)
FVD16: 355.19
Inception Score: 47.76
video-generation-on-ucf-101PYoCo (Zero-shot, 64x64, unconditional)
FVD16: 310
Inception Score: 60.01

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Preserve Your Own Correlation: A Noise Prior for Video Diffusion Models | Papers | HyperAI