Command Palette
Search for a command to run...
SiTH: Single-view Textured Human Reconstruction with Image-Conditioned Diffusion
Ho Hsuan-I ; Song Jie ; Hilliges Otmar

Abstract
A long-standing goal of 3D human reconstruction is to create lifelike andfully detailed 3D humans from single-view images. The main challenge lies ininferring unknown body shapes, appearances, and clothing details in areas notvisible in the images. To address this, we propose SiTH, a novel pipeline thatuniquely integrates an image-conditioned diffusion model into a 3D meshreconstruction workflow. At the core of our method lies the decomposition ofthe challenging single-view reconstruction problem into generativehallucination and reconstruction subproblems. For the former, we employ apowerful generative diffusion model to hallucinate unseen back-view appearancebased on the input images. For the latter, we leverage skinned body meshes asguidance to recover full-body texture meshes from the input and back-viewimages. SiTH requires as few as 500 3D human scans for training whilemaintaining its generality and robustness to diverse images. Extensiveevaluations on two 3D human benchmarks, including our newly created one,highlighted our method's superior accuracy and perceptual quality in 3Dtextured human reconstruction. Our code and evaluation benchmark are availableat https://ait.ethz.ch/sith
Code Repositories
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| 3d-human-reconstruction-on-4d-dress | SiTH_Inner | Chamfer (cm): 2.110 IoU: 0.755 Normal Consistency: 0.824 |
| 3d-human-reconstruction-on-4d-dress | SiTH_Outer | Chamfer (cm): 2.322 IoU: 0.749 Normal Consistency: 0.794 |
| 3d-human-reconstruction-on-customhumans | SiTH | Chamfer Distance P-to-S: 1.871 Chamfer Distance S-to-P: 2.045 Normal Consistency: 0.826 f-Score: 37.029 |
| lifelike-3d-human-generation-on-thuman2-0 | SiTH | CLIP Similarity: 0.8978 LPIPS: 0.1396 PSNR: 17.0533 SSIM: 0.8963 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.