
摘要
我们提出了一项针对3D人体全身姿态估计的基准测试,该测试涉及在整个人体上识别准确的3D关键点,包括面部、手部、躯干和脚部。目前,由于缺乏完全注释且准确的3D全身数据集,深度网络通常需要分别在特定身体部位上进行训练,然后在推理过程中将这些部分组合起来。或者它们依赖于参数化人体模型提供的伪真实数据,但这些数据不如基于检测的方法准确。为了解决这些问题,我们引入了Human3.6M 3D 全身(H3WB)数据集,该数据集使用COCO 全身布局为Human3.6M 数据集提供了全身注释。H3WB 包含10万张图像上的133个全身关键点注释,这得益于我们新开发的多视图管道。此外,我们提出了三项任务:i) 从2D 完整全身姿态提升至3D 全身姿态;ii) 从2D 不完整全身姿态提升至3D 全身姿态;iii) 从单个RGB 图像中估计3D 全身姿态。我们还报告了这些任务中几种流行方法的基线结果。此外,我们还提供了TotalCapture 的自动化3D 全身注释,并通过实验表明,将其与H3WB 结合使用可以提高性能。代码和数据集可在 https://github.com/wholebody3d/wholebody3d 获取。
代码仓库
wholebody3d/wholebody3d
官方
pytorch
GitHub 中提及
基准测试
| 基准 | 方法 | 指标 |
|---|---|---|
| 3d-facial-landmark-localization-on-h3wb | SimpleBaseline | Average MPJPE (mm): 34.0 |
| 3d-facial-landmark-localization-on-h3wb | CanonPose | Average MPJPE (mm): 31.9 |
| 3d-facial-landmark-localization-on-h3wb | SHN + SimpleBaseline | Average MPJPE (mm): 32.5 |
| 3d-facial-landmark-localization-on-h3wb | CPN + Jointformer | Average MPJPE (mm): 20.7 |
| 3d-facial-landmark-localization-on-h3wb | Jointformer | Average MPJPE (mm): 19.8 |
| 3d-facial-landmark-localization-on-h3wb | SimpleBaseline | Average MPJPE (mm): 24.6 |
| 3d-facial-landmark-localization-on-h3wb | CanonPose | Average MPJPE (mm): 24.6 |
| 3d-facial-landmark-localization-on-h3wb | CanonPose + 3D supervision | Average MPJPE (mm): 22.2 |
| 3d-facial-landmark-localization-on-h3wb | Resnet50 | Average MPJPE (mm): 26.3 |
| 3d-facial-landmark-localization-on-h3wb | Large SimpleBaseline | Average MPJPE (mm): 14.6 |
| 3d-facial-landmark-localization-on-h3wb | Large SimpleBaseline | Average MPJPE (mm): 19.8 |
| 3d-facial-landmark-localization-on-h3wb | CanonPose + 3D supervision | Average MPJPE (mm): 17.9 |
| 3d-facial-landmark-localization-on-h3wb | Jointformer | Average MPJPE (mm): 17.8 |
| 3d-hand-pose-estimation-on-h3wb | Large SimpleBaseline | Average MPJPE (mm): 44.8 |
| 3d-hand-pose-estimation-on-h3wb | CPN + Jointformer | Average MPJPE (mm): 56.9 |
| 3d-hand-pose-estimation-on-h3wb | SimpleBaseline | Average MPJPE (mm): 83.4 |
| 3d-hand-pose-estimation-on-h3wb | Large SimpleBaseline | Average MPJPE (mm): 31.7 |
| 3d-hand-pose-estimation-on-h3wb | Jointformer | Average MPJPE (mm): 43.7 |
| 3d-hand-pose-estimation-on-h3wb | SimpleBaseline | Average MPJPE (mm): 42.5 |
| 3d-hand-pose-estimation-on-h3wb | SHN + SimpleBaseline | Average MPJPE (mm): 64.3 |
| 3d-hand-pose-estimation-on-h3wb | CanonPose + 3D supervision | Average MPJPE (mm): 47.4 |
| 3d-hand-pose-estimation-on-h3wb | CanonPose | Average MPJPE (mm): 48.9 |
| 3d-hand-pose-estimation-on-h3wb | Resnet50 | Average MPJPE (mm): 63.1 |
| 3d-hand-pose-estimation-on-h3wb | Jointformer | Average MPJPE (mm): 53.5 |
| 3d-hand-pose-estimation-on-h3wb | CanonPose | Average MPJPE (mm): 56.2 |
| 3d-hand-pose-estimation-on-h3wb | CanonPose + 3D supervision | Average MPJPE (mm): 38.3 |
| 3d-human-pose-estimation-on-h3wb | CanonPose | MPJPE: 193.7 |
| 3d-human-pose-estimation-on-h3wb | CPN + Jointformer | MPJPE: 142.8 |
| 3d-human-pose-estimation-on-h3wb | Jointformer | MPJPE: 103.0 |
| 3d-human-pose-estimation-on-h3wb | Large SimpleBaseline | MPJPE: 131.6 |
| 3d-human-pose-estimation-on-h3wb | CanonPose | MPJPE: 264.4 |
| 3d-human-pose-estimation-on-h3wb | CanonPose + 3D supervision | MPJPE: 155.9 |
| 3d-human-pose-estimation-on-h3wb | Large SimpleBaseline | MPJPE: 112.6 |
| 3d-human-pose-estimation-on-h3wb | CanonPose + 3D supervision | MPJPE: 117.5 |
| 3d-human-pose-estimation-on-h3wb | Resnet50 | MPJPE: 151.6 |
| 3d-human-pose-estimation-on-h3wb | SimpleBaseline | MPJPE: 252.0 |
| 3d-human-pose-estimation-on-h3wb | Jointformer | MPJPE: 84.9 |
| 3d-human-pose-estimation-on-h3wb | SHN + SimpleBaseline | MPJPE: 189.6 |
| 3d-human-pose-estimation-on-h3wb | SimpleBaseline | MPJPE: 125.7 |