HyperAIHyperAI

Command Palette

Search for a command to run...

5 months ago

SMPLer-X: Scaling Up Expressive Human Pose and Shape Estimation

SMPLer-X: Scaling Up Expressive Human Pose and Shape Estimation

Abstract

Expressive human pose and shape estimation (EHPS) unifies body, hands, andface motion capture with numerous applications. Despite encouraging progress,current state-of-the-art methods still depend largely on a confined set oftraining datasets. In this work, we investigate scaling up EHPS towards thefirst generalist foundation model (dubbed SMPLer-X), with up to ViT-Huge as thebackbone and training with up to 4.5M instances from diverse data sources. Withbig data and the large model, SMPLer-X exhibits strong performance acrossdiverse test benchmarks and excellent transferability to even unseenenvironments. 1) For the data scaling, we perform a systematic investigation on32 EHPS datasets, including a wide range of scenarios that a model trained onany single dataset cannot handle. More importantly, capitalizing on insightsobtained from the extensive benchmarking process, we optimize our trainingscheme and select datasets that lead to a significant leap in EHPScapabilities. 2) For the model scaling, we take advantage of visiontransformers to study the scaling law of model sizes in EHPS. Moreover, ourfinetuning strategy turn SMPLer-X into specialist models, allowing them toachieve further performance boosts. Notably, our foundation model SMPLer-Xconsistently delivers state-of-the-art results on seven benchmarks such asAGORA (107.2 mm NMVE), UBody (57.4 mm PVE), EgoBody (63.6 mm PVE), and EHF(62.3 mm PVE without finetuning). Homepage:https://caizhongang.github.io/projects/SMPLer-X/

Code Repositories

wqyin/smplest-x
pytorch
Mentioned in GitHub
caizhongang/SMPLer-X
Official
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
3d-human-pose-estimation-on-3dpwSMPLer-X
MPJPE: 75.2
3d-human-pose-estimation-on-ubodySMPLer-X
PA-PVE-All: 31.9
PA-PVE-Face: 2.8
PA-PVE-Hands: 10.3
PVE-All: 57.5
PVE-Face: 21.6
PVE-Hands: 40.2
3d-human-reconstruction-on-ehfSMPLer-X
MPVPE: 62.4
PA V2V (mm), whole body: 37.1
3d-multi-person-mesh-recovery-on-agoraSMPLer-X
B-NMVE: 68.3
F-MVE: 29.9
FB-MVE: 99.7
FB-NMVE: 107.2
LH/RH-MVE: 39.3

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
SMPLer-X: Scaling Up Expressive Human Pose and Shape Estimation | Papers | HyperAI