HyperAIHyperAI

Command Palette

Search for a command to run...

5 months ago

FaceFormer: Speech-Driven 3D Facial Animation with Transformers

Fan Yingruo ; Lin Zhaojiang ; Saito Jun ; Wang Wenping ; Komura Taku

FaceFormer: Speech-Driven 3D Facial Animation with Transformers

Abstract

Speech-driven 3D facial animation is challenging due to the complex geometryof human faces and the limited availability of 3D audio-visual data. Priorworks typically focus on learning phoneme-level features of short audio windowswith limited context, occasionally resulting in inaccurate lip movements. Totackle this limitation, we propose a Transformer-based autoregressive model,FaceFormer, which encodes the long-term audio context and autoregressivelypredicts a sequence of animated 3D face meshes. To cope with the data scarcityissue, we integrate the self-supervised pre-trained speech representations.Also, we devise two biased attention mechanisms well suited to this specifictask, including the biased cross-modal multi-head (MH) attention and the biasedcausal MH self-attention with a periodic positional encoding strategy. Theformer effectively aligns the audio-motion modalities, whereas the latteroffers abilities to generalize to longer audio sequences. Extensive experimentsand a perceptual user study show that our approach outperforms the existingstate-of-the-arts. The code will be made available.

Code Repositories

EvelynFan/FaceFormer
Official
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
3d-face-animation-on-beat2FaceFormer
MSE: 7.787
3d-face-animation-on-biwi-3d-audiovisualFaceFormer
FDD: 4.6408
Lip Vertex Error: 5.3077
3d-face-animation-on-vocasetFaceFormer
Lip Vertex Error: 5.3742

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
FaceFormer: Speech-Driven 3D Facial Animation with Transformers | Papers | HyperAI