HyperAIHyperAI

Command Palette

Search for a command to run...

5 months ago

MotionAGFormer: Enhancing 3D Human Pose Estimation with a Transformer-GCNFormer Network

Mehraban Soroush ; Adeli Vida ; Taati Babak

MotionAGFormer: Enhancing 3D Human Pose Estimation with a
  Transformer-GCNFormer Network

Abstract

Recent transformer-based approaches have demonstrated excellent performancein 3D human pose estimation. However, they have a holistic view and by encodingglobal relationships between all the joints, they do not capture the localdependencies precisely. In this paper, we present a novel Attention-GCNFormer(AGFormer) block that divides the number of channels by using two paralleltransformer and GCNFormer streams. Our proposed GCNFormer module exploits thelocal relationship between adjacent joints, outputting a new representationthat is complementary to the transformer output. By fusing these tworepresentation in an adaptive way, AGFormer exhibits the ability to betterlearn the underlying 3D structure. By stacking multiple AGFormer blocks, wepropose MotionAGFormer in four different variants, which can be chosen based onthe speed-accuracy trade-off. We evaluate our model on two popular benchmarkdatasets: Human3.6M and MPI-INF-3DHP. MotionAGFormer-B achievesstate-of-the-art results, with P1 errors of 38.4mm and 16.2mm, respectively.Remarkably, it uses a quarter of the parameters and is three times morecomputationally efficient than the previous leading model on Human3.6M dataset.Code and models are available at https://github.com/TaatiTeam/MotionAGFormer.

Code Repositories

taatiteam/motionagformer
Official
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
3d-human-pose-estimation-on-human36mMotionAGFormer-B (T=243)
#Frames: 243
Average MPJPE (mm): 19.4
Multi-View or Monocular: Monocular
3d-human-pose-estimation-on-human36mMotionAGFormer-XS (T=27)
#Frames: 27
Average MPJPE (mm): 28.1
Multi-View or Monocular: Monocular
3d-human-pose-estimation-on-human36mMotionAGFormer-S (T=81)
#Frames: 81
Average MPJPE (mm): 26.5
Multi-View or Monocular: Monocular
3d-human-pose-estimation-on-human36mMotionAGFormer-L (T=243)
#Frames: 243
Average MPJPE (mm): 17.3
Multi-View or Monocular: Monocular
3d-human-pose-estimation-on-mpi-inf-3dhpMotionAGFormer-L (T=81)
AUC: 85.3
MPJPE: 16.2
PCK: 98.2
3d-human-pose-estimation-on-mpi-inf-3dhpMotionAGFormer-XS (T=27)
AUC: 83.5
MPJPE: 19.2
PCK: 98.2
3d-human-pose-estimation-on-mpi-inf-3dhpMotionAGFormer-B (T=81)
AUC: 84.2
MPJPE: 18.2
PCK: 98.3
3d-human-pose-estimation-on-mpi-inf-3dhpMotionAGFormer-S (T=81)
AUC: 84.5
MPJPE: 17.1
PCK: 98.3
classification-on-full-body-parkinsonsMotionAGFormer
F1-score (weighted): 0.42
monocular-3d-human-pose-estimation-on-human3MotionAGFormer-B
2D detector: SH
Average MPJPE (mm): 38.4
Frames Needed: 243
Need Ground Truth 2D Pose: No
Use Video Sequence: Yes
monocular-3d-human-pose-estimation-on-human3MotionAGFormer-S
2D detector: SH
Average MPJPE (mm): 42.5
Frames Needed: 81
Need Ground Truth 2D Pose: No
Use Video Sequence: Yes
monocular-3d-human-pose-estimation-on-human3MotionAGFormer-XS
2D detector: SH
Average MPJPE (mm): 45.1
Frames Needed: 27
Need Ground Truth 2D Pose: No
Use Video Sequence: Yes
monocular-3d-human-pose-estimation-on-human3MotionAGFormer-L
2D detector: SH
Average MPJPE (mm): 38.4
Frames Needed: 243
Need Ground Truth 2D Pose: No
Use Video Sequence: Yes

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
MotionAGFormer: Enhancing 3D Human Pose Estimation with a Transformer-GCNFormer Network | Papers | HyperAI