Command Palette
Search for a command to run...
Dominik Kulon Riza Alp Güler Iasonas Kokkinos Michael Bronstein Stefanos Zafeiriou

Abstract
We introduce a simple and effective network architecture for monocular 3D hand pose estimation consisting of an image encoder followed by a mesh convolutional decoder that is trained through a direct 3D hand mesh reconstruction loss. We train our network by gathering a large-scale dataset of hand action in YouTube videos and use it as a source of weak supervision. Our weakly-supervised mesh convolutions-based system largely outperforms state-of-the-art methods, even halving the errors on the in the wild benchmark. The dataset and additional resources are available at https://arielai.com/mesh_hands.
Code Repositories
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| 3d-hand-pose-estimation-on-freihand | YoutubeHand | PA-F@15mm: 0.966 PA-F@5mm: 0.614 PA-MPJPE: 8.4 PA-MPVPE: 8.6 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.