HyperAIHyperAI

Command Palette

Search for a command to run...

5 months ago

EMAGE: Towards Unified Holistic Co-Speech Gesture Generation via Expressive Masked Audio Gesture Modeling

EMAGE: Towards Unified Holistic Co-Speech Gesture Generation via
  Expressive Masked Audio Gesture Modeling

Abstract

We propose EMAGE, a framework to generate full-body human gestures from audioand masked gestures, encompassing facial, local body, hands, and globalmovements. To achieve this, we first introduce BEAT2 (BEAT-SMPLX-FLAME), a newmesh-level holistic co-speech dataset. BEAT2 combines a MoShed SMPL-X body withFLAME head parameters and further refines the modeling of head, neck, andfinger movements, offering a community-standardized, high-quality 3D motioncaptured dataset. EMAGE leverages masked body gesture priors during training toboost inference performance. It involves a Masked Audio Gesture Transformer,facilitating joint training on audio-to-gesture generation and masked gesturereconstruction to effectively encode audio and body gesture hints. Encoded bodyhints from masked gestures are then separately employed to generate facial andbody movements. Moreover, EMAGE adaptively merges speech features from theaudio's rhythm and content and utilizes four compositional VQ-VAEs to enhancethe results' fidelity and diversity. Experiments demonstrate that EMAGEgenerates holistic gestures with state-of-the-art performance and is flexiblein accepting predefined spatial-temporal gesture inputs, generating complete,audio-synchronized results. Our code and dataset are availablehttps://pantomatrix.github.io/EMAGE/

Code Repositories

PantoMatrix/PantoMatrix
Official
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
gesture-generation-on-beat2EMAGE
FGD: 0.5512

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
EMAGE: Towards Unified Holistic Co-Speech Gesture Generation via Expressive Masked Audio Gesture Modeling | Papers | HyperAI