HyperAIHyperAI

Command Palette

Search for a command to run...

SMPLer-X: Scaling Up Expressive Human Pose and Shape Estimation

Abstract

Expressive human pose and shape estimation (EHPS) unifies body, hands, andface motion capture with numerous applications. Despite encouraging progress,current state-of-the-art methods still depend largely on a confined set oftraining datasets. In this work, we investigate scaling up EHPS towards thefirst generalist foundation model (dubbed SMPLer-X), with up to ViT-Huge as thebackbone and training with up to 4.5M instances from diverse data sources. Withbig data and the large model, SMPLer-X exhibits strong performance acrossdiverse test benchmarks and excellent transferability to even unseenenvironments. 1) For the data scaling, we perform a systematic investigation on32 EHPS datasets, including a wide range of scenarios that a model trained onany single dataset cannot handle. More importantly, capitalizing on insightsobtained from the extensive benchmarking process, we optimize our trainingscheme and select datasets that lead to a significant leap in EHPScapabilities. 2) For the model scaling, we take advantage of visiontransformers to study the scaling law of model sizes in EHPS. Moreover, ourfinetuning strategy turn SMPLer-X into specialist models, allowing them toachieve further performance boosts. Notably, our foundation model SMPLer-Xconsistently delivers state-of-the-art results on seven benchmarks such asAGORA (107.2 mm NMVE), UBody (57.4 mm PVE), EgoBody (63.6 mm PVE), and EHF(62.3 mm PVE without finetuning). Homepage:https://caizhongang.github.io/projects/SMPLer-X/


Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing

HyperAI Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp