
摘要
现有的基于骨架的零样本动作识别方法利用投影网络学习骨架特征和语义嵌入的共享潜在空间。动作识别数据集固有的不平衡性,表现为变化的骨架序列与恒定的类别标签之间的差异,给对齐带来了显著挑战。为了解决这一不平衡问题,我们提出了SA-DVAE——通过解耦变分自编码器实现语义对齐的方法。该方法首先采用特征解耦技术,将骨架特征分解为两个独立的部分——一部分与语义相关,另一部分则无关——从而更好地对齐骨架特征和语义特征。我们通过一对模态特定的变分自编码器并结合总体校正惩罚来实现这一想法。我们在三个基准数据集上进行了实验:NTU RGB+D、NTU RGB+D 120 和 PKU-MMD,实验结果表明,SA-DVAE 在现有方法的基础上取得了性能提升。代码可在 https://github.com/pha123661/SA-DVAE 获取。
代码仓库
pha123661/SA-DVAE
官方
pytorch
GitHub 中提及
基准测试
| 基准 | 方法 | 指标 |
|---|---|---|
| generalized-zero-shot-skeletal-action | SA-DVAE + augmented text | Random Split Harmonic Mean: 75.51 |
| generalized-zero-shot-skeletal-action | SA-DVAE | Harmonic Mean (12 unseen classes): 42.56 Harmonic Mean (5 unseen classes): 66.27 Random Split Harmonic Mean: 75.27 |
| generalized-zero-shot-skeletal-action-1 | SA-DVAE | Harmonic Mean (10 unseen classes): 60.42 Harmonic Mean (24 unseen classes): 44.50 Random Split Harmonic Mean: 47.54 |
| generalized-zero-shot-skeletal-action-1 | SA-DVAE + augmented text | Random Split Harmonic Mean: 50.72 |
| generalized-zero-shot-skeletal-action-2 | SA-DVAE | Random Split Harmonic Mean: 54.72 |
| zero-shot-skeletal-action-recognition-on-ntu | SA-DVAE | Accuracy (12 unseen classes): 41.38 Accuracy (5 unseen classes): 82.37 Random Split Accuracy: 84.20 |
| zero-shot-skeletal-action-recognition-on-ntu | SA-DVAE + augmented text | Random Split Accuracy: 87.61 |
| zero-shot-skeletal-action-recognition-on-ntu-1 | SA-DVAE | Accuracy (10 unseen classes): 68.77 Accuracy (24 unseen classes): 46.12 Random Split Accuracy: 50.67 |
| zero-shot-skeletal-action-recognition-on-ntu-1 | SA-DVAE + augmented text | Random Split Accuracy: 57.16 |
| zero-shot-skeletal-action-recognition-on-pku | SA-DVAE | Random Split Accuracy: 66.54 |