
摘要
在机器学习任务中,采用降低标注标准的学习方式——如噪声标签、部分标签以及多重标签候选等——是一种普遍存在的挑战。我们统称此类标签为“不精确标签”(imprecise labels)。以往的方法通常针对每一种新出现的不精确标签配置设计特定的解决方案,然而当多种不精确性配置共存时,这种策略往往难以持续适用。本文提出了一种统一的不精确标签学习框架——不精确标签学习(Imprecise Label Learning, ILL),旨在整合多种不精确标签配置下的学习任务。ILL 采用期望最大化(Expectation-Maximization, EM)算法来建模不精确标签信息,并将真实标签视为隐变量。与以往方法试图在训练过程中近似推断正确标签不同,ILL 考虑了由不精确信息所蕴含的所有可能标签的完整分布,从而更全面地利用标签不确定性。实验表明,ILL 能够无缝适应部分标签学习、半监督学习、噪声标签学习,以及更为关键的多种设置混合的情形。尤为突出的是,ILL 在处理不精确标签方面显著优于现有各类专门方法,成为首个在多种复杂场景下均表现出稳健且高效性能的统一框架。我们期望本工作能够激发更多关于该方向的研究,推动不精确标签学习在更广泛的实际场景中的应用,尤其是在精确标签获取成本高昂、过程复杂的领域中,充分释放其潜力。
代码仓库
基准测试
| 基准 | 方法 | 指标 |
|---|---|---|
| learning-with-noisy-labels-on-cifar-100n | ILL | Accuracy (mean): 65.84 |
| learning-with-noisy-labels-on-cifar-10n | ILL | Accuracy (mean): 95.47 |
| learning-with-noisy-labels-on-cifar-10n-1 | ILL | Accuracy (mean): 94.85 |
| learning-with-noisy-labels-on-cifar-10n-2 | ILL | Accuracy (mean): 95.04 |
| learning-with-noisy-labels-on-cifar-10n-3 | ILL | Accuracy (mean): 95.13 |
| learning-with-noisy-labels-on-cifar-10n-worst | ILL | Accuracy (mean): 93.58 |
| learning-with-noisy-labels-on-clothing1m | ILL | Test Accuracy: 74.02 |
| learning-with-noisy-labels-on-mini-webvision | ILL | Top 1 Accuracy: 79.37 |
| partial-label-learning-on-caltech-ucsd-birds | ILL | Accuracy: 70.77 |
| partial-label-learning-on-cifar-10-partial | ILL | Accuracy: 96.37 |
| partial-label-learning-on-cifar-10-partial-1 | ILL | Accuracy: 96.26 |
| partial-label-learning-on-cifar-10-partial-2 | ILL | Accuracy: 95.91 |
| partial-label-learning-on-cifar-100-partial | ILL | Accuracy: 75.31 |
| partial-label-learning-on-cifar-100-partial-1 | ILL | Accuracy: 74.58 |
| partial-label-learning-on-cifar-100-partial-2 | ILL | Accuracy: 74 |