Command Palette
Search for a command to run...
VideoCanvas: Unified Video Completion from Arbitrary Spatiotemporal Patches via In-Context Conditioning
Minghong Cai Qiulin Wang Zongli Ye Wenze Liu Quande Liu Weicai Ye Xintao Wang Pengfei Wan Kun Gai Xiangyu Yue

Abstract
We introduce the task of arbitrary spatio-temporal video completion, where avideo is generated from arbitrary, user-specified patches placed at any spatiallocation and timestamp, akin to painting on a video canvas. This flexibleformulation naturally unifies many existing controllable video generationtasks--including first-frame image-to-video, inpainting, extension, andinterpolation--under a single, cohesive paradigm. Realizing this vision,however, faces a fundamental obstacle in modern latent video diffusion models:the temporal ambiguity introduced by causal VAEs, where multiple pixel framesare compressed into a single latent representation, making precise frame-levelconditioning structurally difficult. We address this challenge withVideoCanvas, a novel framework that adapts the In-Context Conditioning (ICC)paradigm to this fine-grained control task with zero new parameters. We proposea hybrid conditioning strategy that decouples spatial and temporal control:spatial placement is handled via zero-padding, while temporal alignment isachieved through Temporal RoPE Interpolation, which assigns each condition acontinuous fractional position within the latent sequence. This resolves theVAE's temporal ambiguity and enables pixel-frame-aware control on a frozenbackbone. To evaluate this new capability, we develop VideoCanvasBench, thefirst benchmark for arbitrary spatio-temporal video completion, covering bothintra-scene fidelity and inter-scene creativity. Experiments demonstrate thatVideoCanvas significantly outperforms existing conditioning paradigms,establishing a new state of the art in flexible and unified video generation.
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.