5 months ago

EMAGE: Towards Unified Holistic Co-Speech Gesture Generation via Expressive Masked Audio Gesture Modeling

Liu Haiyang ; Zhu Zihao ; Becherini Giorgio ; Peng Yichen ; Su Mingyang ; Zhou You ; Zhe Xuefei ; Iwamoto Naoya ; Zheng Bo ; Black

Abstract

We propose EMAGE, a framework to generate full-body human gestures from audioand masked gestures, encompassing facial, local body, hands, and globalmovements. To achieve this, we first introduce BEAT2 (BEAT-SMPLX-FLAME), a newmesh-level holistic co-speech dataset. BEAT2 combines a MoShed SMPL-X body withFLAME head parameters and further refines the modeling of head, neck, andfinger movements, offering a community-standardized, high-quality 3D motioncaptured dataset. EMAGE leverages masked body gesture priors during training toboost inference performance. It involves a Masked Audio Gesture Transformer,facilitating joint training on audio-to-gesture generation and masked gesturereconstruction to effectively encode audio and body gesture hints. Encoded bodyhints from masked gestures are then separately employed to generate facial andbody movements. Moreover, EMAGE adaptively merges speech features from theaudio's rhythm and content and utilizes four compositional VQ-VAEs to enhancethe results' fidelity and diversity. Experiments demonstrate that EMAGEgenerates holistic gestures with state-of-the-art performance and is flexiblein accepting predefined spatial-temporal gesture inputs, generating complete,audio-synchronized results. Our code and dataset are availablehttps://pantomatrix.github.io/EMAGE/

Code Repositories

PantoMatrix/PantoMatrix

Official

pytorch

Mentioned in GitHub

Benchmarks

Benchmark	Methodology	Metrics
gesture-generation-on-beat2	EMAGE	FGD: 0.5512

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started

Hyper Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Command Palette