Command Palette
Search for a command to run...
Zhaoxi Chen Tianqi Liu Long Zhuo Jiawei Ren Zeng Tao He Zhu Fangzhou Hong Liang Pan Ziwei Liu

Abstract
We present 4DNeX, the first feed-forward framework for generating 4D (i.e.,dynamic 3D) scene representations from a single image. In contrast to existingmethods that rely on computationally intensive optimization or requiremulti-frame video inputs, 4DNeX enables efficient, end-to-end image-to-4Dgeneration by fine-tuning a pretrained video diffusion model. Specifically, 1)to alleviate the scarcity of 4D data, we construct 4DNeX-10M, a large-scaledataset with high-quality 4D annotations generated using advancedreconstruction approaches. 2) we introduce a unified 6D video representationthat jointly models RGB and XYZ sequences, facilitating structured learning ofboth appearance and geometry. 3) we propose a set of simple yet effectiveadaptation strategies to repurpose pretrained video diffusion models for 4Dmodeling. 4DNeX produces high-quality dynamic point clouds that enablenovel-view video synthesis. Extensive experiments demonstrate that 4DNeXoutperforms existing 4D generation methods in efficiency and generalizability,offering a scalable solution for image-to-4D modeling and laying the foundationfor generative 4D world models that simulate dynamic scene evolution.
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.