HyperAIHyperAI

Command Palette

Search for a command to run...

5 months ago

Interactive Spatiotemporal Token Attention Network for Skeleton-based General Interactive Action Recognition

Wen Yuhang ; Tang Zixuan ; Pang Yunsheng ; Ding Beichen ; Liu Mengyuan

Interactive Spatiotemporal Token Attention Network for Skeleton-based
  General Interactive Action Recognition

Abstract

Recognizing interactive action plays an important role in human-robotinteraction and collaboration. Previous methods use late fusion andco-attention mechanism to capture interactive relations, which have limitedlearning capability or inefficiency to adapt to more interacting entities. Withassumption that priors of each entity are already known, they also lackevaluations on a more general setting addressing the diversity of subjects. Toaddress these problems, we propose an Interactive Spatiotemporal TokenAttention Network (ISTA-Net), which simultaneously model spatial, temporal, andinteractive relations. Specifically, our network contains a tokenizer topartition Interactive Spatiotemporal Tokens (ISTs), which is a unified way torepresent motions of multiple diverse entities. By extending the entitydimension, ISTs provide better interactive representations. To jointly learnalong three dimensions in ISTs, multi-head self-attention blocks integratedwith 3D convolutions are designed to capture inter-token correlations. Whenmodeling correlations, a strict entity ordering is usually irrelevant forrecognizing interactive actions. To this end, Entity Rearrangement is proposedto eliminate the orderliness in ISTs for interchangeable entities. Extensiveexperiments on four datasets verify the effectiveness of ISTA-Net byoutperforming state-of-the-art methods. Our code is publicly available athttps://github.com/Necolizer/ISTA-Net

Code Repositories

Necolizer/ISTA-Net
Official
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
3d-action-recognition-on-assembly101ISTA-Net
Actions Top-1: 28.07
Object Top-1: 31.69
Verbs Top-1: 62.66
action-recognition-on-h2o-2-hands-and-objectsISTA-Net
Actions Top-1: 89.09
Hand Pose: 3D
Object Label: No
Object Pose: Yes
RGB: No
human-interaction-recognition-on-ntu-rgb-d-1ISTA-Net
Accuracy (Cross-Setup): 91.7
Accuracy (Cross-Subject): 90.5
human-interaction-recognition-on-sbuISTA-Net
Accuracy: 98.51±1.47
skeleton-based-action-recognition-on-h2o-2ISTA-Net
Accuracy: 89.09±1.21

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Interactive Spatiotemporal Token Attention Network for Skeleton-based General Interactive Action Recognition | Papers | HyperAI