Command Palette
Search for a command to run...
Guo Hengkai Wang Guijin Chen Xinghao Zhang Cairong

Abstract
3D hand pose estimation from single depth image is an important andchallenging problem for human-computer interaction. Recently deep convolutionalnetworks (ConvNet) with sophisticated design have been employed to address it,but the improvement over traditional random forest based methods is not soapparent. To exploit the good practice and promote the performance for handpose estimation, we propose a tree-structured Region Ensemble Network (REN) fordirectly 3D coordinate regression. It first partitions the last convolutionoutputs of ConvNet into several grid regions. The results from separatefully-connected (FC) regressors on each regions are then integrated by anotherFC layer to perform the estimation. By exploitation of several trainingstrategies including data augmentation and smooth $L_1$ loss, proposed REN cansignificantly improve the performance of ConvNet to localize hand joints. Theexperimental results demonstrate that our approach achieves the bestperformance among state-of-the-art algorithms on three public hand posedatasets. We also experiment our methods on fingertip detection and human posedatasets and obtain state-of-the-art accuracy.
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| hand-pose-estimation-on-icvl-hands | Tree Region Ensemble Network | Average 3D Error: 7.31 |
| hand-pose-estimation-on-nyu-hands | REN | Average 3D Error: 15.6 |
| pose-estimation-on-itop-front-view | REN | Mean mAP: 84.9 |
| pose-estimation-on-itop-top-view | REN | Mean mAP: 75.5 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.