
摘要
核心集(coreset)是指训练集的一个子集,使用该子集进行训练时,机器学习算法可以达到与使用整个原始数据集训练相似的性能。核心集发现是一个活跃且开放的研究方向,因为它不仅能够提高算法的训练速度,还有助于人类更好地理解模型结果。基于前人的研究,本文提出了一种新的方法:通过迭代优化候选核心集,添加和删除样本。由于在限制训练规模和结果质量之间存在明显的权衡,因此采用了多目标进化算法来同时最小化集合中的点数和分类错误率。实验结果表明,在非平凡基准测试中,所提出的这种方法能够使分类器获得比现有最先进核心集发现技术更低的错误率和更好的泛化能力。
代码仓库
基准测试
| 基准 | 方法 | 指标 |
|---|---|---|
| core-set-discovery-on-abalone | EvoCore | F1(10-fold): 18.6 |
| core-set-discovery-on-amazon-employee-access | EvoCore | F1(10-fold): 91.5 |
| core-set-discovery-on-credit-g | EvoCore | F1(10-fold): 74.3 |
| core-set-discovery-on-electricity | EvoCore | F1(10-fold): 69.3 |
| core-set-discovery-on-glass-identification | EvoCore | F1(10-fold): 64.3 |
| core-set-discovery-on-isolet | EvoCore | F1(10-fold): 90.5 |
| core-set-discovery-on-jm1 | EvoCore | F1(10-fold): 77.1 |
| core-set-discovery-on-kr-vs-kp | EvoCore | F1(10-fold): 93.7 |
| core-set-discovery-on-letter | EvoCore | F1(10-fold): 65.9 |
| core-set-discovery-on-micro-mass | EvoCore | F1(10-fold): 83.9 |
| core-set-discovery-on-mnist | EvoCore | F1(10-fold): 77.2 |
| core-set-discovery-on-mozilla4 | EvoCore | F1(10-fold): 91.2 |
| core-set-discovery-on-soybean | EvoCore | F1(10-fold): 91.1 |
| core-set-discovery-on-uci-gas | EvoCore | F1(10-fold): 94.6 |