HyperAIHyperAI

Command Palette

Search for a command to run...

5 months ago

SkateFormer: Skeletal-Temporal Transformer for Human Action Recognition

Do Jeonghyeok ; Kim Munchurl

SkateFormer: Skeletal-Temporal Transformer for Human Action Recognition

Abstract

Skeleton-based action recognition, which classifies human actions based onthe coordinates of joints and their connectivity within skeleton data, iswidely utilized in various scenarios. While Graph Convolutional Networks (GCNs)have been proposed for skeleton data represented as graphs, they suffer fromlimited receptive fields constrained by joint connectivity. To address thislimitation, recent advancements have introduced transformer-based methods.However, capturing correlations between all joints in all frames requiressubstantial memory resources. To alleviate this, we propose a novel approachcalled Skeletal-Temporal Transformer (SkateFormer) that partitions joints andframes based on different types of skeletal-temporal relation (Skate-Type) andperforms skeletal-temporal self-attention (Skate-MSA) within each partition. Wecategorize the key skeletal-temporal relations for action recognition into atotal of four distinct types. These types combine (i) two skeletal relationtypes based on physically neighboring and distant joints, and (ii) two temporalrelation types based on neighboring and distant frames. Through thispartition-specific attention strategy, our SkateFormer can selectively focus onkey joints and frames crucial for action recognition in an action-adaptivemanner with efficient computation. Extensive experiments on various benchmarkdatasets validate that our SkateFormer outperforms recent state-of-the-artmethods.

Code Repositories

KAIST-VICLab/SkateFormer
Official
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
human-interaction-recognition-on-ntu-rgb-dSkateFormer
Accuracy (Cross-Subject): 97.1
Accuracy (Cross-View): 99.3
human-interaction-recognition-on-ntu-rgb-d-1SkateFormer
Accuracy (Cross-Setup): 93.2
Accuracy (Cross-Subject): 92.3
skeleton-based-action-recognition-on-n-uclaSkateFormer
Accuracy: 98.3
skeleton-based-action-recognition-on-ntu-rgbdSkateFormer
Accuracy (CS): 93.5
Accuracy (CV): 97.8
Ensembled Modalities: 4
skeleton-based-action-recognition-on-ntu-rgbd-1SkateFormer
Accuracy (Cross-Setup): 91.4
Accuracy (Cross-Subject): 89.8
Ensembled Modalities: 4

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
SkateFormer: Skeletal-Temporal Transformer for Human Action Recognition | Papers | HyperAI