HyperAIHyperAI

Command Palette

Search for a command to run...

5 months ago

MotionBERT: A Unified Perspective on Learning Human Motion Representations

Zhu Wentao ; Ma Xiaoxuan ; Liu Zhaoyang ; Liu Libin ; Wu Wayne ; Wang Yizhou

MotionBERT: A Unified Perspective on Learning Human Motion
  Representations

Abstract

We present a unified perspective on tackling various human-centric videotasks by learning human motion representations from large-scale andheterogeneous data resources. Specifically, we propose a pretraining stage inwhich a motion encoder is trained to recover the underlying 3D motion fromnoisy partial 2D observations. The motion representations acquired in this wayincorporate geometric, kinematic, and physical knowledge about human motion,which can be easily transferred to multiple downstream tasks. We implement themotion encoder with a Dual-stream Spatio-temporal Transformer (DSTformer)neural network. It could capture long-range spatio-temporal relationships amongthe skeletal joints comprehensively and adaptively, exemplified by the lowest3D pose estimation error so far when trained from scratch. Furthermore, ourproposed framework achieves state-of-the-art performance on all threedownstream tasks by simply finetuning the pretrained motion encoder with asimple regression head (1-2 layers), which demonstrates the versatility of thelearned motion representations. Code and models are available athttps://motionbert.github.io/

Code Repositories

Walter0807/MotionBERT
Official
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
3d-human-pose-estimation-on-3dpwMotionBERT-HybrIK
MPJPE: 68.8
MPVPE: 79.4
PA-MPJPE: 40.6
3d-human-pose-estimation-on-3dpwMotionBERT (Finetune)
MPJPE: 76.9
MPVPE: 88.1
PA-MPJPE: 47.2
3d-human-pose-estimation-on-human36mMotionBERT (Finetune)
#Frames: 243
Average MPJPE (mm): 16.9
Multi-View or Monocular: Monocular
Using 2D ground-truth joints: Yes
classification-on-full-body-parkinsonsMotionBERT
F1-score (weighted): 0.47
classification-on-full-body-parkinsonsMotionBERT-LITE
F1-score (weighted): 0.43
monocular-3d-human-pose-estimation-on-human3MotionBERT (Scratch)
2D detector: SH
Average MPJPE (mm): 39.2
Frames Needed: 243
Need Ground Truth 2D Pose: No
Use Video Sequence: Yes
monocular-3d-human-pose-estimation-on-human3MotionBERT (Finetune)
2D detector: SH
Average MPJPE (mm): 37.5
Frames Needed: 243
Need Ground Truth 2D Pose: No
Use Video Sequence: Yes
one-shot-3d-action-recognition-on-ntu-rgbdMotionBERT (Finetune)
Accuracy: 67.4%
skeleton-based-action-recognition-on-ntu-rgbdMotionBert (finetune)
Accuracy (CS): 93.0
Accuracy (CV): 97.2

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
MotionBERT: A Unified Perspective on Learning Human Motion Representations | Papers | HyperAI