Command Palette
Search for a command to run...
ConvFormer: Parameter Reduction in Transformer Models for 3D Human Pose Estimation by Leveraging Dynamic Multi-Headed Convolutional Attention
Diaz-Arias Alec ; Shin Dmitriy

Abstract
Recently, fully-transformer architectures have replaced the defactoconvolutional architecture for the 3D human pose estimation task. In this paperwe propose \textbf{\textit{ConvFormer}}, a novel convolutional transformer thatleverages a new \textbf{\textit{dynamic multi-headed convolutionalself-attention}} mechanism for monocular 3D human pose estimation. We designeda spatial and temporal convolutional transformer to comprehensively model humanjoint relations within individual frames and globally across the motionsequence. Moreover, we introduce a novel notion of \textbf{\textit{temporaljoints profile}} for our temporal ConvFormer that fuses complete temporalinformation immediately for a local neighborhood of joint features. We havequantitatively and qualitatively validated our method on three common benchmarkdatasets: Human3.6M, MPI-INF-3DHP, and HumanEva. Extensive experiments havebeen conducted to identify the optimal hyper-parameter set. These experimentsdemonstrated that we achieved a \textbf{significant parameter reductionrelative to prior transformer models} while attaining State-of-the-Art (SOTA)or near SOTA on all three datasets. Additionally, we achieved SOTA for ProtocolIII on H36M for both GT and CPN detection inputs. Finally, we obtained SOTA onall three metrics for the MPI-INF-3DHP dataset and for all three subjects onHumanEva under Protocol II.
Code Repositories
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| 3d-human-pose-estimation-on-human36m | ConvFormer (T=243) | Average MPJPE (mm): 29.8 Multi-View or Monocular: Monocular Using 2D ground-truth joints: Yes |
| 3d-human-pose-estimation-on-human36m | ConvFormer (T=243, CPN) | Average MPJPE (mm): 43.2 Multi-View or Monocular: Monocular Using 2D ground-truth joints: No |
| 3d-human-pose-estimation-on-humaneva-i | ConvFormer (T=43) | Mean Reconstruction Error (mm): 24.3 |
| 3d-human-pose-estimation-on-mpi-inf-3dhp | ConvFormer | AUC: 69.8 MPJPE: 53.6 PCK: 96.4 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.