Command Palette
Search for a command to run...
SA-DVAE: Improving Zero-Shot Skeleton-Based Action Recognition by Disentangled Variational Autoencoders
Li Sheng-Wei ; Wei Zi-Xiang ; Chen Wei-Jie ; Yu Yi-Hsin ; Yang Chih-Yuan ; Hsu Jane Yung-jen

Abstract
Existing zero-shot skeleton-based action recognition methods utilizeprojection networks to learn a shared latent space of skeleton features andsemantic embeddings. The inherent imbalance in action recognition datasets,characterized by variable skeleton sequences yet constant class labels,presents significant challenges for alignment. To address the imbalance, wepropose SA-DVAE -- Semantic Alignment via Disentangled VariationalAutoencoders, a method that first adopts feature disentanglement to separateskeleton features into two independent parts -- one is semantic-related andanother is irrelevant -- to better align skeleton and semantic features. Weimplement this idea via a pair of modality-specific variational autoencoderscoupled with a total correction penalty. We conduct experiments on threebenchmark datasets: NTU RGB+D, NTU RGB+D 120 and PKU-MMD, and our experimentalresults show that SA-DAVE produces improved performance over existing methods.The code is available at https://github.com/pha123661/SA-DVAE.
Code Repositories
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| generalized-zero-shot-skeletal-action | SA-DVAE + augmented text | Random Split Harmonic Mean: 75.51 |
| generalized-zero-shot-skeletal-action | SA-DVAE | Harmonic Mean (12 unseen classes): 42.56 Harmonic Mean (5 unseen classes): 66.27 Random Split Harmonic Mean: 75.27 |
| generalized-zero-shot-skeletal-action-1 | SA-DVAE | Harmonic Mean (10 unseen classes): 60.42 Harmonic Mean (24 unseen classes): 44.50 Random Split Harmonic Mean: 47.54 |
| generalized-zero-shot-skeletal-action-1 | SA-DVAE + augmented text | Random Split Harmonic Mean: 50.72 |
| generalized-zero-shot-skeletal-action-2 | SA-DVAE | Random Split Harmonic Mean: 54.72 |
| zero-shot-skeletal-action-recognition-on-ntu | SA-DVAE | Accuracy (12 unseen classes): 41.38 Accuracy (5 unseen classes): 82.37 Random Split Accuracy: 84.20 |
| zero-shot-skeletal-action-recognition-on-ntu | SA-DVAE + augmented text | Random Split Accuracy: 87.61 |
| zero-shot-skeletal-action-recognition-on-ntu-1 | SA-DVAE | Accuracy (10 unseen classes): 68.77 Accuracy (24 unseen classes): 46.12 Random Split Accuracy: 50.67 |
| zero-shot-skeletal-action-recognition-on-ntu-1 | SA-DVAE + augmented text | Random Split Accuracy: 57.16 |
| zero-shot-skeletal-action-recognition-on-pku | SA-DVAE | Random Split Accuracy: 66.54 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.