Command Palette
Search for a command to run...
Vertex Feature Encoding and Hierarchical Temporal Modeling in a Spatial-Temporal Graph Convolutional Network for Action Recognition
Konstantinos Papadopoulos Enjie Ghorbel Djamila Aouada Björn Ottersten

Abstract
This paper extends the Spatial-Temporal Graph Convolutional Network (ST-GCN) for skeleton-based action recognition by introducing two novel modules, namely, the Graph Vertex Feature Encoder (GVFE) and the Dilated Hierarchical Temporal Convolutional Network (DH-TCN). On the one hand, the GVFE module learns appropriate vertex features for action recognition by encoding raw skeleton data into a new feature space. On the other hand, the DH-TCN module is capable of capturing both short-term and long-term temporal dependencies using a hierarchical dilated convolutional network. Experiments have been conducted on the challenging NTU RGB-D-60 and NTU RGB-D 120 datasets. The obtained results show that our method competes with state-of-the-art approaches while using a smaller number of layers and parameters; thus reducing the required training time and memory.
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| action-recognition-in-videos-on-ntu-rgbd-120 | ST-GCN + AS-GCN w/DH-TCN | Accuracy (Cross-Setup): 78.3 Accuracy (Cross-Subject): 79.2 |
| skeleton-based-action-recognition-on-ntu-rgbd | GVFE + AS-GCN with DH-TCN | Accuracy (CS): 85.3 Accuracy (CV): 92.8 |
| skeleton-based-action-recognition-on-ntu-rgbd-1 | GVFE + AS-GCN with DH-TCN | Accuracy (Cross-Setup): 79.8% Accuracy (Cross-Subject): 78.3% |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.