Command Palette
Search for a command to run...
CodeTalker: Speech-Driven 3D Facial Animation with Discrete Motion Prior
Xing Jinbo ; Xia Menghan ; Zhang Yuechen ; Cun Xiaodong ; Wang Jue ; Wong Tien-Tsin

Abstract
Speech-driven 3D facial animation has been widely studied, yet there is stilla gap to achieving realism and vividness due to the highly ill-posed nature andscarcity of audio-visual data. Existing works typically formulate thecross-modal mapping into a regression task, which suffers from theregression-to-mean problem leading to over-smoothed facial motions. In thispaper, we propose to cast speech-driven facial animation as a code query taskin a finite proxy space of the learned codebook, which effectively promotes thevividness of the generated motions by reducing the cross-modal mappinguncertainty. The codebook is learned by self-reconstruction over real facialmotions and thus embedded with realistic facial motion priors. Over thediscrete motion space, a temporal autoregressive model is employed tosequentially synthesize facial motions from the input speech signal, whichguarantees lip-sync as well as plausible facial expressions. We demonstratethat our approach outperforms current state-of-the-art methods bothqualitatively and quantitatively. Also, a user study further justifies oursuperiority in perceptual quality.
Code Repositories
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| 3d-face-animation-on-beat2 | CodeTalker | MSE: 8.026 |
| 3d-face-animation-on-biwi-3d-audiovisual | CodeTalker | FDD: 4.1170 Lip Vertex Error: 4.7914 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.