HyperAIHyperAI

Command Palette

Search for a command to run...

5 months ago

FaceXHuBERT: Text-less Speech-driven E(X)pressive 3D Facial Animation Synthesis Using Self-Supervised Speech Representation Learning

Haque Kazi Injamamul ; Yumak Zerrin

FaceXHuBERT: Text-less Speech-driven E(X)pressive 3D Facial Animation
  Synthesis Using Self-Supervised Speech Representation Learning

Abstract

This paper presents FaceXHuBERT, a text-less speech-driven 3D facialanimation generation method that allows to capture personalized and subtle cuesin speech (e.g. identity, emotion and hesitation). It is also very robust tobackground noise and can handle audio recorded in a variety of situations (e.g.multiple people speaking). Recent approaches employ end-to-end deep learningtaking into account both audio and text as input to generate facial animationfor the whole face. However, scarcity of publicly available expressive audio-3Dfacial animation datasets poses a major bottleneck. The resulting animationsstill have issues regarding accurate lip-synching, expressivity,person-specific information and generalizability. We effectively employself-supervised pretrained HuBERT model in the training process that allows usto incorporate both lexical and non-lexical information in the audio withoutusing a large lexicon. Additionally, guiding the training with a binary emotioncondition and speaker identity distinguishes the tiniest subtle facial motion.We carried out extensive objective and subjective evaluation in comparison toground-truth and state-of-the-art work. A perceptual user study demonstratesthat our approach produces superior results with respect to the realism of theanimation 78% of the time in comparison to the state-of-the-art. In addition,our method is 4 times faster eliminating the use of complex sequential modelssuch as transformers. We strongly recommend watching the supplementary videobefore reading the paper. We also provide the implementation and evaluationcodes with a GitHub repository link.

Code Repositories

galib360/facexhubert
Official
jax
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
3d-face-animation-on-biwi-3d-audiovisualFaceXHuBERT
FDD: 4.96
Lip Vertex Error: 4.56

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
FaceXHuBERT: Text-less Speech-driven E(X)pressive 3D Facial Animation Synthesis Using Self-Supervised Speech Representation Learning | Papers | HyperAI