Command Palette
Search for a command to run...
Anurag Arnab; Carl Doersch; Andrew Zisserman

Abstract
We present a bundle-adjustment-based algorithm for recovering accurate 3D human pose and meshes from monocular videos. Unlike previous algorithms which operate on single frames, we show that reconstructing a person over an entire sequence gives extra constraints that can resolve ambiguities. This is because videos often give multiple views of a person, yet the overall body shape does not change and 3D positions vary slowly. Our method improves not only on standard mocap-based datasets like Human 3.6M -- where we show quantitative improvements -- but also on challenging in-the-wild datasets such as Kinetics. Building upon our algorithm, we present a new dataset of more than 3 million frames of YouTube videos from Kinetics with automatically generated 3D poses and meshes. We show that retraining a single-frame 3D pose estimator on this data improves accuracy on both real-world and mocap data by evaluating on the 3DPW and HumanEVA datasets.
Code Repositories
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| 3d-human-pose-estimation-on-3dpw | Bundle Adjustment | PA-MPJPE: 72.2 |
| 3d-human-pose-estimation-on-human36m | Bundle Adjustment | Average MPJPE (mm): 77.8 PA-MPJPE: 41.6 |
| 3d-human-pose-estimation-on-human36m | Bundle Adjustment (GTi) | Average MPJPE (mm): 63.3 |
| monocular-3d-human-pose-estimation-on-human3 | Bundle Adjustment | Frames Needed: 190 Need Ground Truth 2D Pose: No Use Video Sequence: Yes |
| monocular-3d-human-pose-estimation-on-human3 | Bundle Adjustment (GTi) | Average MPJPE (mm): 63.3 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.