
摘要
在广义零样本学习(generalized zero-shot learning)的众多方法中,大多依赖于图像特征空间与类别嵌入空间之间的跨模态映射。由于标注图像成本高昂,一种可行方向是通过生成图像或图像特征来扩充数据集。然而,前者往往难以保留细粒度细节,而后者则需要学习与类别嵌入相关联的映射关系。在本研究中,我们进一步推进特征生成技术,提出一种新模型:通过模态特定的对齐变分自编码器(modality-specific aligned variational autoencoders),学习图像特征与类别嵌入共享的潜在空间。该方法使得潜在特征中保留了图像与类别所需的判别性信息,进而在此基础上训练一个Softmax分类器。本方法的核心在于,我们通过对齐从图像数据和辅助信息中学习到的分布,构建出蕴含未见类别关键多模态信息的潜在特征。我们在多个基准数据集(包括CUB、SUN、AWA1和AWA2)上评估了所学习的潜在特征,不仅在广义零样本学习任务上达到了新的最优性能,同时在少样本学习(few-shot learning)任务中也取得了显著提升。此外,在ImageNet上采用多种零样本划分策略的实验结果表明,我们的潜在特征在大规模场景下具有良好的泛化能力。
基准测试
| 基准 | 方法 | 指标 |
|---|---|---|
| generalized-few-shot-learning-on-awa2 | CA-VAE | Per-Class Accuracy (1-shot): 64.0 Per-Class Accuracy (10-shots): 79.0 Per-Class Accuracy (2-shots): 71.3 Per-Class Accuracy (5-shots): 76.6 |
| generalized-few-shot-learning-on-awa2 | DA-VAE | Per-Class Accuracy (1-shot): 68.0 Per-Class Accuracy (10-shots): 76.8 Per-Class Accuracy (2-shots): 73.0 Per-Class Accuracy (5-shots): 75.6 |
| generalized-few-shot-learning-on-cub | CADA-VAE | Per-Class Accuracy (2-shots): 59.2 Per-Class Accuracy (1-shot): 55.2 Per-Class Accuracy (10-shots): 64.9 Per-Class Accuracy (20-shots): 66.0 Per-Class Accuracy (5-shots): 63.0 |
| generalized-few-shot-learning-on-cub | CA-VAE | Per-Class Accuracy (2-shots): 54.4 Per-Class Accuracy (1-shot): 50.6 Per-Class Accuracy (10-shots): 62.2 Per-Class Accuracy (5-shots): 59.6 |
| generalized-few-shot-learning-on-cub | DA-VAE | Per-Class Accuracy (2-shots): 54.6 Per-Class Accuracy (1-shot): 49.2 Per-Class Accuracy (10-shots): 60.8 Per-Class Accuracy (5-shots): 58.8 |
| generalized-few-shot-learning-on-sun | CADA-VAE | Per-Class Accuracy (1-shot): 37.8 Per-Class Accuracy (10-shots): 45.8 Per-Class Accuracy (2-shots): 41.4 Per-Class Accuracy (5-shots): 44.2 |
| generalized-few-shot-learning-on-sun | CA-VAE | Per-Class Accuracy (1-shot): 37.8 Per-Class Accuracy (10-shots): 45.1 Per-Class Accuracy (2-shots): 40.8 Per-Class Accuracy (5-shots): 43.6 |
| generalized-few-shot-learning-on-sun | DA-VAE | Per-Class Accuracy (1-shot): 40.6 Per-Class Accuracy (10-shots): 47.6 Per-Class Accuracy (2-shots): 43.0 Per-Class Accuracy (5-shots): 46.0 |
| long-tail-learning-with-class-descriptors-on | CADA-VAE | Long-Tailed Accuracy: 57.4 Per-Class Accuracy: 48.3 |
| long-tail-learning-with-class-descriptors-on-1 | CADA-VAE | Long-Tailed Accuracy: 35.1 Per-Class Accuracy: 32.8 |
| long-tail-learning-with-class-descriptors-on-2 | CADA-VAE | Long-Tailed Accuracy: 89.5 Per-Class Accuracy: 73.5 |
| long-tail-learning-with-class-descriptors-on-3 | CADA-VAE | Per-Class Accuracy: 49.3 |