Command Palette
Search for a command to run...
Learning Multi-Granular Spatio-Temporal Graph Network for Skeleton-based Action Recognition
Tailin Chen Desen Zhou Jian Wang Shidong Wang Yu Guan Xuming He Errui Ding

Abstract
The task of skeleton-based action recognition remains a core challenge in human-centred scene understanding due to the multiple granularities and large variation in human motion. Existing approaches typically employ a single neural representation for different motion patterns, which has difficulty in capturing fine-grained action classes given limited training data. To address the aforementioned problems, we propose a novel multi-granular spatio-temporal graph network for skeleton-based action classification that jointly models the coarse- and fine-grained skeleton motion patterns. To this end, we develop a dual-head graph network consisting of two interleaved branches, which enables us to extract features at two spatio-temporal resolutions in an effective and efficient manner. Moreover, our network utilises a cross-head communication strategy to mutually enhance the representations of both heads. We conducted extensive experiments on three large-scale datasets, namely NTU RGB+D 60, NTU RGB+D 120, and Kinetics-Skeleton, and achieves the state-of-the-art performance on all the benchmarks, which validates the effectiveness of our method.
Code Repositories
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| skeleton-based-action-recognition-on-kinetics | DualHead-Net | Accuracy: 38.4 |
| skeleton-based-action-recognition-on-ntu-rgbd | DualHead-Net | Accuracy (CS): 92.0 Accuracy (CV): 96.6 |
| skeleton-based-action-recognition-on-ntu-rgbd-1 | DualHead-Net | Accuracy (Cross-Setup): 89.3 Accuracy (Cross-Subject): 88.2 Ensembled Modalities: 4 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.