3 个月前

持续时空图卷积网络

持续时空图卷积网络

摘要

基于骨架数据的图推理已成为人体动作识别领域一种极具前景的方法。然而,现有基于图的方法大多以完整的时序序列作为输入,在在线推理场景下应用时,往往导致显著的计算冗余。针对这一问题,本文通过将时空图卷积神经网络(Spatio-Temporal Graph Convolutional Neural Network)重构为一种持续推理网络(Continual Inference Network),实现了无需重复处理帧数据即可逐步进行时间序列预测。为评估所提方法,我们构建了ST-GCN的持续推理版本CoST-GCN,并进一步提出了两种采用不同自注意力机制的衍生方法:CoAGCN与CoS-TR。我们系统研究了权重迁移策略与网络结构优化对推理加速的影响,并在NTU RGB+D 60、NTU RGB+D 120以及Kinetics Skeleton 400三个公开数据集上进行了实验。在保持相近预测精度的前提下,实验结果表明,所提方法在时间复杂度上最高可降低109倍,硬件层面实现最高26倍的加速,同时在线推理过程中最大内存占用减少52%。

代码仓库

基准测试

基准方法指标
skeleton-based-action-recognition-on-kineticsST-GCN (2-stream)
Accuracy: 34.4
GFLOPS per prediction: 24.09
skeleton-based-action-recognition-on-kineticsCoST-GCN* (1-stream)
Accuracy: 30.2
GFLOPS per prediction: 0.11
skeleton-based-action-recognition-on-kineticsCoAGCN* (1-stream)
Accuracy: 23.3
GFLOPS per prediction: 0.12
skeleton-based-action-recognition-on-kineticsCoST-GCN* (2-stream)
Accuracy: 32.2
GFLOPS per prediction: 0.22
skeleton-based-action-recognition-on-kineticsCoS-TR* (1-stream)
Accuracy: 27.4
GFLOPS per prediction: 0.11
skeleton-based-action-recognition-on-kineticsCoAGCN (1-stream)
Accuracy: 33
GFLOPS per prediction: 0.18
skeleton-based-action-recognition-on-kineticsCoST-GCN (1-stream)
Accuracy: 31.8
GFLOPS per prediction: 0.16
skeleton-based-action-recognition-on-kineticsCoAGCN (2-stream)
GFLOPS per prediction: 0.36
skeleton-based-action-recognition-on-kineticsCoS-TR* (2-stream)
Accuracy: 29.9
GFLOPS per prediction: 0.22
skeleton-based-action-recognition-on-kineticsCoST-GCN (2-stream)
Accuracy: 33.1
GFLOPS per prediction: 0.32
skeleton-based-action-recognition-on-kineticsCoS-TR (2-stream)
Accuracy: 32.7
GFLOPS per prediction: 0.31
skeleton-based-action-recognition-on-kineticsS-TR (1-stream)
Accuracy: 32
GFLOPS per prediction: 11.62
skeleton-based-action-recognition-on-kineticsCoS-TR (1-stream)
Accuracy: 29.7
skeleton-based-action-recognition-on-kineticsST-GCN (1-stream)
Accuracy: 33.4
GFLOPS per prediction: 12.04
skeleton-based-action-recognition-on-kineticsAGCN (2-stream)
Accuracy: 36.9
GFLOPS per prediction: 26.91
skeleton-based-action-recognition-on-kineticsCoAGCN* (2-stream)
Accuracy: 27.5
GFLOPS per prediction: 0.25
skeleton-based-action-recognition-on-kineticsAGCN (1-stream)
Accuracy: 35
GFLOPS per prediction: 13.45
skeleton-based-action-recognition-on-kineticsS-TR (2-stream)
Accuracy: 34.7
GFLOPS per prediction: 23.24
skeleton-based-action-recognition-on-ntu-rgbdCoAGCN* (2-stream)
Accuracy (CS): 86.0
Accuracy (CV): 93.1
GFLOPs per pred: 0.44
skeleton-based-action-recognition-on-ntu-rgbdCoST-GCN* (2-stream)
Accuracy (CS): 88.3
Accuracy (CV): 95
GFLOPs per pred: 0.32
skeleton-based-action-recognition-on-ntu-rgbdCoS-TR*
Accuracy (CS): 86.3
Accuracy (CV): 92.4
GFLOPs per pred: 0.15
skeleton-based-action-recognition-on-ntu-rgbdCoS-TR* (2-stream)
Accuracy (CS): 88.9
Accuracy (CV): 94.8
GFLOPs per pred: 0.3
skeleton-based-action-recognition-on-ntu-rgbdST-GCN
Accuracy (CS): 86
Accuracy (CV): 93.4
GFLOPs per pred: 16.73
skeleton-based-action-recognition-on-ntu-rgbdCoAGCN*
Accuracy (CS): 84.1
Accuracy (CV): 92.6
skeleton-based-action-recognition-on-ntu-rgbdCoST-GCN*
Accuracy (CS): 86.3
Accuracy (CV): 93.8
GFLOPs per pred: 0.16
skeleton-based-action-recognition-on-ntu-rgbd-1S-TR (1-stream)
Accuracy (Cross-Setup): 81.8
Accuracy (Cross-Subject): 80.2
GFLOPS per prediction: 16.2
skeleton-based-action-recognition-on-ntu-rgbd-1ST-GCN (1-stream)
Accuracy (Cross-Subject): 79
GFLOPS per prediction: 16.73
skeleton-based-action-recognition-on-ntu-rgbd-1AGCN (1-stream)
Accuracy (Cross-Setup): 80.7
Accuracy (Cross-Subject): 79.7
GFLOPS per prediction: 18.69
skeleton-based-action-recognition-on-ntu-rgbd-1CoS-TR* (2-stream)
Accuracy (Cross-Setup): 86.1
Accuracy (Cross-Subject): 84.8
GFLOPS per prediction: 0.3
skeleton-based-action-recognition-on-ntu-rgbd-1CoST-GCN* (1-stream)
Accuracy (Cross-Setup): 81.6
Accuracy (Cross-Subject): 79.4
GFLOPS per prediction: 0.16
skeleton-based-action-recognition-on-ntu-rgbd-1CoST-GCN* (2-stream)
Accuracy (Cross-Setup): 85.5
Accuracy (Cross-Subject): 84.0
GFLOPS per prediction: 0.32
skeleton-based-action-recognition-on-ntu-rgbd-1CoAGCN* (2-stream)
Accuracy (Cross-Setup): 82
Accuracy (Cross-Subject): 80.4
GFLOPS per prediction: 0.44
skeleton-based-action-recognition-on-ntu-rgbd-1CoS-TR* (1-stream)
Accuracy (Cross-Setup): 81.7
Accuracy (Cross-Subject): 79.7
GFLOPS per prediction: 0.15
skeleton-based-action-recognition-on-ntu-rgbd-1AGCN (2-stream)
Accuracy (Cross-Setup): 85.4
Accuracy (Cross-Subject): 84
GFLOPS per prediction: 37.38
skeleton-based-action-recognition-on-ntu-rgbd-1CoAGCN* (1-stream)
Accuracy (Cross-Setup): 79.1
Accuracy (Cross-Subject): 77.3
GFLOPS per prediction: 0.22
skeleton-based-action-recognition-on-ntu-rgbd-1ST-GCN (2-stream)
Accuracy (Cross-Setup): 85.1
Accuracy (Cross-Subject): 83.7
GFLOPS per prediction: 33.46
skeleton-based-action-recognition-on-ntu-rgbd-1S-TR (2-stream)
Accuracy (Cross-Setup): 86.2
Accuracy (Cross-Subject): 84.8
GFLOPS per prediction: 32.4

用 AI 构建 AI

从想法到上线——通过免费 AI 协同编程、开箱即用的环境和市场最优价格的 GPU 加速您的 AI 开发

AI 协同编程
即用型 GPU
最优价格
立即开始

Hyper Newsletters

订阅我们的最新资讯
我们会在北京时间 每周一的上午九点 向您的邮箱投递本周内的最新更新
邮件发送服务由 MailChimp 提供
持续时空图卷积网络 | 论文 | HyperAI超神经