
摘要
我们提出了一种通用的方法,通过利用预训练的特征提取器,在无需标签的情况下实现图像分类。该方法基于一个关键观察:在预训练特征空间中,彼此最近邻的样本很可能具有相同的类别标签。我们通过自蒸馏训练聚类头来实现这一目标,并提出了一种新颖的目标函数,该函数通过引入一种变体的点互信息(pointwise mutual information)并结合实例加权机制,学习图像特征之间的关联关系。实验表明,所提出的目标函数能够有效抑制错误正样本对(false positive pairs)的负面影响,同时高效地利用预训练特征空间中的结构信息。在17种不同的预训练模型上,我们的方法相较于传统的k-means聚类,在ImageNet和CIFAR100数据集上的聚类准确率分别提升了6.1%和12.2%。最后,结合自监督视觉Transformer模型,我们在ImageNet数据集上实现了61.6%的聚类准确率。相关代码已开源,地址为:https://github.com/HHU-MMBS/TEMI-official-BMVC2023。
代码仓库
HHU-MMBS/TEMI-official-BMVC2023
官方
pytorch
GitHub 中提及
基准测试
| 基准 | 方法 | 指标 |
|---|---|---|
| image-clustering-on-cifar-10 | TEMI DINO ViT-B | ARI: 0.885 Accuracy: 0.94.5 Backbone: ViT-B NMI: 0.886 Train set: Train |
| image-clustering-on-cifar-10 | TEMI CLIP ViT-L (openai) | ARI: 0.932 Accuracy: 0.969 Backbone: ViT-L NMI: 0.926 Train set: Train |
| image-clustering-on-cifar-100 | TEMI DINO ViT-B | ARI: 0.533 Accuracy: 0.671 NMI: 0.769 Train Set: Train |
| image-clustering-on-cifar-100 | TEMI CLIP ViT-L (openai) | ARI: 0.612 Accuracy: 0.737 NMI: 0.799 Train Set: Train |
| image-clustering-on-imagenet | TEMI DINO (ViT-B) | ARI: 45.9 Accuracy: 58.0 NMI: 81.4 |
| image-clustering-on-imagenet | TEMI MSN (ViT-L) | ARI: 48.4 Accuracy: 61.6 NMI: 82.5 |
| image-clustering-on-imagenet-100 | TEMI CLIP ViT-L (openai) | ACCURACY: 0.8343 ARI: 0.7581 NMI: 0.9006 |
| image-clustering-on-imagenet-100 | TEMI MSN ViT-L | ACCURACY: 0.8286 ARI: 0.7408 NMI: 0.8853 |
| image-clustering-on-imagenet-100 | TEMI DINO ViT-B | ACCURACY: 0.7505 ARI: 0.6545 NMI: 0.8565 |
| image-clustering-on-imagenet-200 | TEMI CLIP ViT-L (openai) | - |
| image-clustering-on-imagenet-200 | TEMI DINO ViT-B | - |
| image-clustering-on-imagenet-200 | TEMI MSN ViT-L | - |
| image-clustering-on-imagenet-50-1 | TEMI DINO ViT-B | ACCURACY: 0.801 ARI: 0.7093 NMI: 0.8610 |
| image-clustering-on-imagenet-50-1 | TEMI CLIP ViT-L (openai) | ACCURACY: 0.8827 ARI: 0.8272 NMI: 0.9232 |
| image-clustering-on-imagenet-50-1 | TEMI MSN ViT-L | ACCURACY: 0.8487 ARI: 0.7646 NMI: 0.8814 |
| image-clustering-on-stl-10 | TEMI DINO ViT-B | ARI: 0.968 Accuracy: 0.985 Backbone: ViT-B NMI: 0.965 Train Split: Train |