7 months ago

Junfei Xiao Ceyuan Yang Lvmin Zhang Shengqu Cai Yang Zhao Yuwei Guo Gordon Wetzstein Maneesh Agrawala Alan Yuille Lu Jiang

Abstract

We present Captain Cinema, a generation framework for short movie generation.Given a detailed textual description of a movie storyline, our approach firstlygenerates a sequence of keyframes that outline the entire narrative, whichensures long-range coherence in both the storyline and visual appearance (e.g.,scenes and characters). We refer to this step as top-down keyframe planning.These keyframes then serve as conditioning signals for a video synthesis model,which supports long context learning, to produce the spatio-temporal dynamicsbetween them. This step is referred to as bottom-up video synthesis. To supportstable and efficient generation of multi-scene long narrative cinematic works,we introduce an interleaved training strategy for Multimodal DiffusionTransformers (MM-DiT), specifically adapted for long-context video data. Ourmodel is trained on a specially curated cinematic dataset consisting ofinterleaved data pairs. Our experiments demonstrate that Captain Cinemaperforms favorably in the automated creation of visually coherent and narrativeconsistent short movies in high quality and efficiency. Project page:https://thecinema.ai

Source PDF View Code

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Powered by MailChimp

7 months ago

Junfei Xiao Ceyuan Yang Lvmin Zhang Shengqu Cai Yang Zhao Yuwei Guo Gordon Wetzstein Maneesh Agrawala Alan Yuille Lu Jiang

Abstract

We present Captain Cinema, a generation framework for short movie generation.Given a detailed textual description of a movie storyline, our approach firstlygenerates a sequence of keyframes that outline the entire narrative, whichensures long-range coherence in both the storyline and visual appearance (e.g.,scenes and characters). We refer to this step as top-down keyframe planning.These keyframes then serve as conditioning signals for a video synthesis model,which supports long context learning, to produce the spatio-temporal dynamicsbetween them. This step is referred to as bottom-up video synthesis. To supportstable and efficient generation of multi-scene long narrative cinematic works,we introduce an interleaved training strategy for Multimodal DiffusionTransformers (MM-DiT), specifically adapted for long-context video data. Ourmodel is trained on a specially curated cinematic dataset consisting ofinterleaved data pairs. Our experiments demonstrate that Captain Cinemaperforms favorably in the automated creation of visually coherent and narrativeconsistent short movies in high quality and efficiency. Project page:https://thecinema.ai

Source PDF View Code

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Powered by MailChimp