HyperAIHyperAI

Command Palette

Search for a command to run...

5 months ago

Part-aware Unified Representation of Language and Skeleton for Zero-shot Action Recognition

Anqi Zhu; Qiuhong Ke; Mingming Gong; James Bailey

Part-aware Unified Representation of Language and Skeleton for Zero-shot Action Recognition

Abstract

While remarkable progress has been made on supervised skeleton-based action recognition, the challenge of zero-shot recognition remains relatively unexplored. In this paper, we argue that relying solely on aligning label-level semantics and global skeleton features is insufficient to effectively transfer locally consistent visual knowledge from seen to unseen classes. To address this limitation, we introduce Part-aware Unified Representation between Language and Skeleton (PURLS) to explore visual-semantic alignment at both local and global scales. PURLS introduces a new prompting module and a novel partitioning module to generate aligned textual and visual representations across different levels. The former leverages a pre-trained GPT-3 to infer refined descriptions of the global and local (body-part-based and temporal-interval-based) movements from the original action labels. The latter employs an adaptive sampling strategy to group visual features from all body joint movements that are semantically relevant to a given description. Our approach is evaluated on various skeleton/language backbones and three large-scale datasets, i.e., NTU-RGB+D 60, NTU-RGB+D 120, and a newly curated dataset Kinetics-skeleton 200. The results showcase the universality and superior performance of PURLS, surpassing prior skeleton-based solutions and standard baselines from other domains. The source codes can be accessed at https://github.com/azzh1/PURLS.

Code Repositories

azzh1/purls
Official
pytorch

Benchmarks

BenchmarkMethodologyMetrics
zero-shot-skeletal-action-recognition-on-ntuPURLS
Accuracy (12 unseen classes): 40.99
Accuracy (5 unseen classes): 79.23
zero-shot-skeletal-action-recognition-on-ntu-1PURLS
Accuracy (10 unseen classes): 71.95
Accuracy (24 unseen classes): 52.01

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Part-aware Unified Representation of Language and Skeleton for Zero-shot Action Recognition | Papers | HyperAI