HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

Lumiere: A Space-Time Diffusion Model for Video Generation

Lumiere: A Space-Time Diffusion Model for Video Generation

Abstract

We introduce Lumiere -- a text-to-video diffusion model designed for synthesizing videos that portray realistic, diverse and coherent motion -- a pivotal challenge in video synthesis. To this end, we introduce a Space-Time U-Net architecture that generates the entire temporal duration of the video at once, through a single pass in the model. This is in contrast to existing video models which synthesize distant keyframes followed by temporal super-resolution -- an approach that inherently makes global temporal consistency difficult to achieve. By deploying both spatial and (importantly) temporal down- and up-sampling and leveraging a pre-trained text-to-image diffusion model, our model learns to directly generate a full-frame-rate, low-resolution video by processing it in multiple space-time scales. We demonstrate state-of-the-art text-to-video generation results, and show that our design easily facilitates a wide range of content creation tasks and video editing applications, including image-to-video, video inpainting, and stylized generation.

Code Repositories

Benchmarks

BenchmarkMethodologyMetrics
text-to-video-generation-on-ucf-101Lumiere (Zero-shot, 1024x1024)
FVD16: 332.49
video-generation-on-ucf-101Lumiere (Zero-shot. 1024x1024, text-conditional)
FVD16: 332.49
Inception Score: 37.54

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Lumiere: A Space-Time Diffusion Model for Video Generation | Papers | HyperAI