Command Palette
Search for a command to run...
Karim Iskakov; Egor Burkov; Victor Lempitsky; Yury Malkov

Abstract
We present two novel solutions for multi-view 3D human pose estimation based on new learnable triangulation methods that combine 3D information from multiple 2D views. The first (baseline) solution is a basic differentiable algebraic triangulation with an addition of confidence weights estimated from the input images. The second solution is based on a novel method of volumetric aggregation from intermediate 2D backbone feature maps. The aggregated volume is then refined via 3D convolutions that produce final 3D joint heatmaps and allow modelling a human pose prior. Crucially, both approaches are end-to-end differentiable, which allows us to directly optimize the target metric. We demonstrate transferability of the solutions across datasets and considerably improve the multi-view state of the art on the Human3.6M dataset. Video demonstration, annotations and additional materials will be posted on our project page (https://saic-violet.github.io/learnable-triangulation).
Code Repositories
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| 3d-human-pose-estimation-on-cmu-panoptic | Learnable Triangulation of Human Pose | Average MPJPE (mm): 13.7 |
| 3d-human-pose-estimation-on-human36m | Learnable Triangulation of Human Pose | Average MPJPE (mm): 20.8 Multi-View or Monocular: Multi-View Using 2D ground-truth joints: No |
| 3d-human-pose-estimation-on-human36m | Learnable Triangulation of Human Pose (Monocular) | Average MPJPE (mm): 49.9 |
| 3d-human-pose-estimation-on-human36m | Learnable Triangulation of Human Pose (filtered) | Average MPJPE (mm): 17.7 Multi-View or Monocular: Multi-View |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.