Command Palette
Search for a command to run...
Yandong Wen Weiyang Liu Bhiksha Raj Rita Singh

Abstract
We present a conditional estimation (CEST) framework to learn 3D facial parameters from 2D single-view images by self-supervised training from videos. CEST is based on the process of analysis by synthesis, where the 3D facial parameters (shape, reflectance, viewpoint, and illumination) are estimated from the face image, and then recombined to reconstruct the 2D face image. In order to learn semantically meaningful 3D facial parameters without explicit access to their labels, CEST couples the estimation of different 3D facial parameters by taking their statistical dependency into account. Specifically, the estimation of any 3D facial parameter is not only conditioned on the given image, but also on the facial parameters that have already been derived. Moreover, the reflectance symmetry and consistency among the video frames are adopted to improve the disentanglement of facial parameters. Together with a novel strategy for incorporating the reflectance symmetry and consistency, CEST can be efficiently trained with in-the-wild video clips. Both qualitative and quantitative experiments demonstrate the effectiveness of CEST.
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| 3d-face-reconstruction-on-realy | CEST | @cheek: 1.456 (±0.485) @forehead: 2.384 (±0.578) @mouth: 1.448 (±0.406) @nose: 2.779 (±0.835) all: 2.017 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.