摘要
大规模图像数据集通常包含不可避免的噪声标签,这会导致深度神经网络出现过拟合现象,进而降低模型性能。现有大多数基于噪声标签的学习方法均采用单阶段框架,其中训练数据的划分与半监督学习(SSL)在优化过程中相互交织,因此其效果显著依赖于分离出的干净样本集的准确性、对噪声先验知识的掌握程度以及半监督学习方法的鲁棒性。针对这一问题,本文提出一种基于对比损失的渐进式样本选择框架——PSSCL(Progressive Sample Selection with Contrastive Loss),该框架分为两个阶段,通过引入鲁棒性损失与对比损失来增强模型的鲁棒性。第一阶段采用长期置信度检测策略,旨在识别出一个小规模的干净样本集;第二阶段则通过扩展该干净样本集以进一步提升模型性能。在多个基准测试中,PSSCL相较于当前最优方法均展现出显著的性能提升。代码已开源,地址为:https://github.com/LanXiaoPang613/PSSCL。
基准测试
| 基准 | 方法 | 指标 |
|---|---|---|
| image-classification-on-mini-webvision-1-0 | PSSCL (130 epochs) | ImageNet Top-1 Accuracy: 79.68 ImageNet Top-5 Accuracy: 95.16 Top-1 Accuracy: 79.56 Top-5 Accuracy: 94.84 |
| image-classification-on-mini-webvision-1-0 | PSSCL (120 epochs) | ImageNet Top-1 Accuracy: 79.40 ImageNet Top-5 Accuracy: 94.84 Top-1 Accuracy: 78.52 Top-5 Accuracy: 93.80 |
| learning-with-noisy-labels-on-animal | PSSCL | Accuracy: 88.74 ImageNet Pretrained: NO Network: Vgg19-BN |
| learning-with-noisy-labels-on-cifar-100n | PSSCL | Accuracy (mean): 72.00 |
| learning-with-noisy-labels-on-cifar-10n | PSSCL | Accuracy (mean): 96.41 |
| learning-with-noisy-labels-on-cifar-10n-1 | PSSCL | Accuracy (mean): 96.17 |
| learning-with-noisy-labels-on-cifar-10n-2 | PSSCL | Accuracy (mean): 96.21 |
| learning-with-noisy-labels-on-cifar-10n-3 | PSSCL | Accuracy (mean): 96.49 |
| learning-with-noisy-labels-on-cifar-10n-worst | PSSCL | Accuracy (mean): 95.12 |
| learning-with-noisy-labels-on-food-101 | PSSCL | Accuracy (% ): 86.41 |