Command Palette
Search for a command to run...
Human-VDM: Learning Single-Image 3D Human Gaussian Splatting from Video Diffusion Models
Liu Zhibin ; Dong Haoye ; Chharia Aviral ; Wu Hefeng

Abstract
Generating lifelike 3D humans from a single RGB image remains a challengingtask in computer vision, as it requires accurate modeling of geometry,high-quality texture, and plausible unseen parts. Existing methods typicallyuse multi-view diffusion models for 3D generation, but they often faceinconsistent view issues, which hinder high-quality 3D human generation. Toaddress this, we propose Human-VDM, a novel method for generating 3D human froma single RGB image using Video Diffusion Models. Human-VDM provides temporallyconsistent views for 3D human generation using Gaussian Splatting. It consistsof three modules: a view-consistent human video diffusion module, a videoaugmentation module, and a Gaussian Splatting module. First, a single image isfed into a human video diffusion module to generate a coherent human video.Next, the video augmentation module applies super-resolution and videointerpolation to enhance the textures and geometric smoothness of the generatedvideo. Finally, the 3D Human Gaussian Splatting module learns lifelike humansunder the guidance of these high-resolution and view-consistent images.Experiments demonstrate that Human-VDM achieves high-quality 3D human from asingle image, outperforming state-of-the-art methods in both generation qualityand quantity. Project page: https://human-vdm.github.io/Human-VDM/
Code Repositories
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| lifelike-3d-human-generation-on-thuman2-0 | Human-VDM | CLIP Similarity: 0.9235 LPIPS: 0.0957 PSNR: 20.068 SSIM: 0.9228 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.