HyperAIHyperAI

Command Palette

Search for a command to run...

5 months ago

TM2T: Stochastic and Tokenized Modeling for the Reciprocal Generation of 3D Human Motions and Texts

Guo Chuan ; Zuo Xinxin ; Wang Sen ; Cheng Li

TM2T: Stochastic and Tokenized Modeling for the Reciprocal Generation of
  3D Human Motions and Texts

Abstract

Inspired by the strong ties between vision and language, the two intimatehuman sensing and communication modalities, our paper aims to explore thegeneration of 3D human full-body motions from texts, as well as its reciprocaltask, shorthanded for text2motion and motion2text, respectively. To tackle theexisting challenges, especially to enable the generation of multiple distinctmotions from the same text, and to avoid the undesirable production of trivialmotionless pose sequences, we propose the use of motion token, a discrete andcompact motion representation. This provides one level playing ground whenconsidering both motions and text signals, as the motion and text tokens,respectively. Moreover, our motion2text module is integrated into the inversealignment process of our text2motion training pipeline, where a significantdeviation of synthesized text from the input text would be penalized by a largetraining loss; empirically this is shown to effectively improve performance.Finally, the mappings in-between the two modalities of motions and texts arefacilitated by adapting the neural model for machine translation (NMT) to ourcontext. This autoregressive modeling of the distribution over discrete motiontokens further enables non-deterministic production of pose sequences, ofvariable lengths, from an input text. Our approach is flexible, could be usedfor both text2motion and motion2text tasks. Empirical evaluations on twobenchmark datasets demonstrate the superior performance of our approach on bothtasks over a variety of state-of-the-art methods. Project page:https://ericguo5513.github.io/TM2T/

Code Repositories

EricGuo5513/TM2T
Official
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
motion-captioning-on-humanml3dTM2T
BERTScore: 37.8
BLEU-4: 22.3
motion-captioning-on-kit-motion-languageTM2T
BERTScore: 23.0
BLEU-4: 18.4
motion-synthesis-on-humanml3dTM2T
Diversity: 8.589
FID: 1.501
Multimodality: 2.424
R Precision Top3: 0.729
motion-synthesis-on-humanml3dText2Gesture
Diversity: 6.409
FID: 5.012
R Precision Top3: 0.345
motion-synthesis-on-humanml3dLanguage2Pose
Diversity: 7.676
FID: 11.02
R Precision Top3: 0.486
motion-synthesis-on-kit-motion-languageText2Gesture
Diversity: 9.334
FID: 12.12
R Precision Top3: 0.338
motion-synthesis-on-kit-motion-languageTM2T
Diversity: 9.473
FID: 3.599
Multimodality: 3.292
R Precision Top3: 0.587
motion-synthesis-on-kit-motion-languageLanguage2Pose
Diversity: 9.073
FID: 6.545
R Precision Top3: 0.483

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
TM2T: Stochastic and Tokenized Modeling for the Reciprocal Generation of 3D Human Motions and Texts | Papers | HyperAI