
摘要
近年来,人体骨架作为人体动作的紧凑表示形式,受到了越来越多的关注。许多基于骨架的动作识别方法采用了图卷积网络(GCN)来从人体骨架中提取特征。尽管先前的研究展示了积极的结果,但基于GCN的方法在鲁棒性、互操作性和可扩展性方面仍存在局限性。在这项工作中,我们提出了一种新的基于骨架的动作识别方法——PoseC3D,该方法依赖于3D热图堆栈而不是图序列作为人体骨架的基本表示形式。与基于GCN的方法相比,PoseC3D在学习时空特征方面更为有效,在对抗姿态估计噪声方面更具鲁棒性,并且在跨数据集设置中具有更好的泛化能力。此外,PoseC3D可以在不增加额外计算成本的情况下处理多人场景,并且其特征可以轻松地在早期融合阶段与其他模态进行整合,这为进一步提升性能提供了广阔的设计空间。在四个具有挑战性的数据集中,PoseC3D无论是在单独使用骨架时还是与RGB模态结合使用时,均表现出一致的优越性能。
代码仓库
sandman002/One-Style-is-All-You-Need-to-Generate-a-Video
pytorch
GitHub 中提及
kennymckormick/pyskl
pytorch
GitHub 中提及
txyugood/PaddlePoseC3D
paddle
open-mmlab/mmaction2
pytorch
基准测试
| 基准 | 方法 | 指标 |
|---|---|---|
| 3d-action-recognition-on-assembly101 | RGBPoseConv3D | Actions Top-1: 33.61 Object Top-1: 42.90 Verbs Top-1: 61.99 |
| action-recognition-in-videos-on-ntu-rgbd | PoseC3D (RGB + Pose) | Accuracy (CS): 97.0 Accuracy (CV): 99.6 |
| action-recognition-in-videos-on-ntu-rgbd-120 | PoseC3D (RGB + Pose) | Accuracy (Cross-Setup): 96.4 Accuracy (Cross-Subject): 95.3 |
| action-recognition-in-videos-on-volleyball | PoseC3D (Pose Only) | Accuracy: 91.3 |
| action-recognition-on-h2o-2-hands-and-objects | RGBPoseConv3D | Actions Top-1: 83.47 Hand Pose: 2D Object Label: No Object Pose: No RGB: Yes |
| group-activity-recognition-on-volleyball | PoseC3D (Pose-Only) | Accuracy: 91.3 |
| skeleton-based-action-recognition-on-kinetics | PoseC3D | Accuracy: 47.7 |
| skeleton-based-action-recognition-on-kinetics | PoseC3D (SlowOnly-346) | Accuracy: 49.1 |
| skeleton-based-action-recognition-on-ntu-rgbd | PoseC3D [3D Heatmap] | Accuracy (CS): 94.1 Accuracy (CV): 97.1 Ensembled Modalities: 2 |
| skeleton-based-action-recognition-on-ntu-rgbd-1 | PoseC3D (w. HRNet 2D Skeleton) | Accuracy (Cross-Setup): 90.3 Accuracy (Cross-Subject): 86.9 |