Command Palette
Search for a command to run...
SkateFormer: Skeletal-Temporal Transformer for Human Action Recognition
Do Jeonghyeok ; Kim Munchurl

Abstract
Skeleton-based action recognition, which classifies human actions based onthe coordinates of joints and their connectivity within skeleton data, iswidely utilized in various scenarios. While Graph Convolutional Networks (GCNs)have been proposed for skeleton data represented as graphs, they suffer fromlimited receptive fields constrained by joint connectivity. To address thislimitation, recent advancements have introduced transformer-based methods.However, capturing correlations between all joints in all frames requiressubstantial memory resources. To alleviate this, we propose a novel approachcalled Skeletal-Temporal Transformer (SkateFormer) that partitions joints andframes based on different types of skeletal-temporal relation (Skate-Type) andperforms skeletal-temporal self-attention (Skate-MSA) within each partition. Wecategorize the key skeletal-temporal relations for action recognition into atotal of four distinct types. These types combine (i) two skeletal relationtypes based on physically neighboring and distant joints, and (ii) two temporalrelation types based on neighboring and distant frames. Through thispartition-specific attention strategy, our SkateFormer can selectively focus onkey joints and frames crucial for action recognition in an action-adaptivemanner with efficient computation. Extensive experiments on various benchmarkdatasets validate that our SkateFormer outperforms recent state-of-the-artmethods.
Code Repositories
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| human-interaction-recognition-on-ntu-rgb-d | SkateFormer | Accuracy (Cross-Subject): 97.1 Accuracy (Cross-View): 99.3 |
| human-interaction-recognition-on-ntu-rgb-d-1 | SkateFormer | Accuracy (Cross-Setup): 93.2 Accuracy (Cross-Subject): 92.3 |
| skeleton-based-action-recognition-on-n-ucla | SkateFormer | Accuracy: 98.3 |
| skeleton-based-action-recognition-on-ntu-rgbd | SkateFormer | Accuracy (CS): 93.5 Accuracy (CV): 97.8 Ensembled Modalities: 4 |
| skeleton-based-action-recognition-on-ntu-rgbd-1 | SkateFormer | Accuracy (Cross-Setup): 91.4 Accuracy (Cross-Subject): 89.8 Ensembled Modalities: 4 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.