a month ago

Towards Good Practices for Deep 3D Hand Pose Estimation

Guo Hengkai Wang Guijin Chen Xinghao Zhang Cairong

Abstract

3D hand pose estimation from single depth image is an important andchallenging problem for human-computer interaction. Recently deep convolutionalnetworks (ConvNet) with sophisticated design have been employed to address it,but the improvement over traditional random forest based methods is not soapparent. To exploit the good practice and promote the performance for handpose estimation, we propose a tree-structured Region Ensemble Network (REN) fordirectly 3D coordinate regression. It first partitions the last convolutionoutputs of ConvNet into several grid regions. The results from separatefully-connected (FC) regressors on each regions are then integrated by anotherFC layer to perform the estimation. By exploitation of several trainingstrategies including data augmentation and smooth $L_1$ loss, proposed REN cansignificantly improve the performance of ConvNet to localize hand joints. Theexperimental results demonstrate that our approach achieves the bestperformance among state-of-the-art algorithms on three public hand posedatasets. We also experiment our methods on fingertip detection and human posedatasets and obtain state-of-the-art accuracy.

Benchmarks

Benchmark	Methodology	Metrics
hand-pose-estimation-on-icvl-hands	Tree Region Ensemble Network	Average 3D Error: 7.31
hand-pose-estimation-on-nyu-hands	REN	Average 3D Error: 15.6
pose-estimation-on-itop-front-view	REN	Mean mAP: 84.9
pose-estimation-on-itop-top-view	REN	Mean mAP: 75.5

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started

Hyper Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning