HyperAIHyperAI

Command Palette

Search for a command to run...

4 months ago

GestureLSM: Latent Shortcut based Co-Speech Gesture Generation with Spatial-Temporal Modeling

Liu Pinxin ; Song Luchuan ; Huang Junhua ; Liu Haiyang ; Xu Chenliang

GestureLSM: Latent Shortcut based Co-Speech Gesture Generation with
  Spatial-Temporal Modeling

Abstract

Generating full-body human gestures based on speech signals remainschallenges on quality and speed. Existing approaches model different bodyregions such as body, legs and hands separately, which fail to capture thespatial interactions between them and result in unnatural and disjointedmovements. Additionally, their autoregressive/diffusion-based pipelines showslow generation speed due to dozens of inference steps. To address these twochallenges, we propose GestureLSM, a flow-matching-based approach for Co-SpeechGesture Generation with spatial-temporal modeling. Our method i) explicitlymodel the interaction of tokenized body regions through spatial and temporalattention, for generating coherent full-body gestures. ii) introduce the flowmatching to enable more efficient sampling by explicitly modeling the latentvelocity space. To overcome the suboptimal performance of flow matchingbaseline, we propose latent shortcut learning and beta distribution time stampsampling during training to enhance gesture synthesis quality and accelerateinference. Combining the spatial-temporal modeling and improved flowmatching-based framework, GestureLSM achieves state-of-the-art performance onBEAT2 while significantly reducing inference time compared to existing methods,highlighting its potential for enhancing digital humans and embodied agents inreal-world applications. Project Page:https://andypinxinliu.github.io/GestureLSM

Code Repositories

andypinxinliu/GestureLSM
Official
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
gesture-generation-on-beat2GestureLSM
FGD: 0.4040

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
GestureLSM: Latent Shortcut based Co-Speech Gesture Generation with Spatial-Temporal Modeling | Papers | HyperAI