4 个月前

MotionAGFormer:通过Transformer-GCNFormer网络增强3D人体姿态估计

MotionAGFormer:通过Transformer-GCNFormer网络增强3D人体姿态估计

摘要

近期基于Transformer的方法在3D人体姿态估计中展现了出色的性能。然而,这些方法通常具有全局视角,通过编码所有关节之间的全局关系,未能精确捕捉局部依赖关系。本文提出了一种新颖的注意力图卷积网络前向(Attention-GCNFormer, AGFormer)模块,该模块通过两条并行的Transformer和GCFormer流来划分通道数。我们提出的GCNFormer模块利用了相邻关节之间的局部关系,输出了一种与Transformer输出互补的新表示。通过以自适应的方式融合这两种表示,AGFormer展示了更好的学习潜在3D结构的能力。通过堆叠多个AGFormer模块,我们提出了四种不同变体的MotionAGFormer模型,可以根据速度与精度的权衡进行选择。我们在两个流行的基准数据集Human3.6M和MPI-INF-3DHP上评估了我们的模型。MotionAGFormer-B取得了最先进的结果,分别在这两个数据集上的P1误差为38.4毫米和16.2毫米。值得注意的是,它使用的参数量仅为之前领先模型的四分之一,并且在Human3.6M数据集上的计算效率提高了三倍。代码和模型可在https://github.com/TaatiTeam/MotionAGFormer 获取。

代码仓库

taatiteam/motionagformer
官方
pytorch
GitHub 中提及

基准测试

基准方法指标
3d-human-pose-estimation-on-human36mMotionAGFormer-B (T=243)
#Frames: 243
Average MPJPE (mm): 19.4
Multi-View or Monocular: Monocular
3d-human-pose-estimation-on-human36mMotionAGFormer-XS (T=27)
#Frames: 27
Average MPJPE (mm): 28.1
Multi-View or Monocular: Monocular
3d-human-pose-estimation-on-human36mMotionAGFormer-S (T=81)
#Frames: 81
Average MPJPE (mm): 26.5
Multi-View or Monocular: Monocular
3d-human-pose-estimation-on-human36mMotionAGFormer-L (T=243)
#Frames: 243
Average MPJPE (mm): 17.3
Multi-View or Monocular: Monocular
3d-human-pose-estimation-on-mpi-inf-3dhpMotionAGFormer-L (T=81)
AUC: 85.3
MPJPE: 16.2
PCK: 98.2
3d-human-pose-estimation-on-mpi-inf-3dhpMotionAGFormer-XS (T=27)
AUC: 83.5
MPJPE: 19.2
PCK: 98.2
3d-human-pose-estimation-on-mpi-inf-3dhpMotionAGFormer-B (T=81)
AUC: 84.2
MPJPE: 18.2
PCK: 98.3
3d-human-pose-estimation-on-mpi-inf-3dhpMotionAGFormer-S (T=81)
AUC: 84.5
MPJPE: 17.1
PCK: 98.3
classification-on-full-body-parkinsonsMotionAGFormer
F1-score (weighted): 0.42
monocular-3d-human-pose-estimation-on-human3MotionAGFormer-B
2D detector: SH
Average MPJPE (mm): 38.4
Frames Needed: 243
Need Ground Truth 2D Pose: No
Use Video Sequence: Yes
monocular-3d-human-pose-estimation-on-human3MotionAGFormer-S
2D detector: SH
Average MPJPE (mm): 42.5
Frames Needed: 81
Need Ground Truth 2D Pose: No
Use Video Sequence: Yes
monocular-3d-human-pose-estimation-on-human3MotionAGFormer-XS
2D detector: SH
Average MPJPE (mm): 45.1
Frames Needed: 27
Need Ground Truth 2D Pose: No
Use Video Sequence: Yes
monocular-3d-human-pose-estimation-on-human3MotionAGFormer-L
2D detector: SH
Average MPJPE (mm): 38.4
Frames Needed: 243
Need Ground Truth 2D Pose: No
Use Video Sequence: Yes

用 AI 构建 AI

从想法到上线——通过免费 AI 协同编程、开箱即用的环境和市场最优价格的 GPU 加速您的 AI 开发

AI 协同编程
即用型 GPU
最优价格
立即开始

Hyper Newsletters

订阅我们的最新资讯
我们会在北京时间 每周一的上午九点 向您的邮箱投递本周内的最新更新
邮件发送服务由 MailChimp 提供
MotionAGFormer:通过Transformer-GCNFormer网络增强3D人体姿态估计 | 论文 | HyperAI超神经