
摘要
聚类是数据挖掘中一项经典任务,旨在根据数据实例之间的相似性对其进行分组。该任务属于探索性且无监督的学习范畴,其结果高度依赖于多种参数设置,通常需要专家多次迭代才能获得满意结果。为更好地建模专家的预期,约束聚类(constrained clustering)应运而生。然而,传统约束聚类仍存在局限:它通常要求所有约束在聚类过程开始前即被提供,缺乏灵活性。本文提出一个更为通用的问题框架,旨在通过一系列动态的聚类调整过程,建模专家在探索性聚类中的交互式决策流程,允许专家在聚类过程中实时添加约束。为此,我们提出一种增量式约束聚类框架,融合主动查询策略与约束编程(Constraint Programming)模型,在满足专家期望的同时保持聚类划分的稳定性,使专家能够清晰理解聚类演化过程及其影响。本模型支持实例级和组级约束,并允许约束的松弛处理。在标准数据集上的实验以及针对卫星影像时序数据分析的案例研究结果表明,所提出的框架具有良好的实用性和有效性。
基准测试
| 基准 | 方法 | 指标 |
|---|---|---|
| incremental-constrained-clustering-on-iris | COP-KMeans+Random | AUBC-ARI (quality): 0.712±0.012 AUBC-ARI (similarity): 0.309±0.004 |
| incremental-constrained-clustering-on-iris | MPCK-Means+Random | AUBC-ARI (quality): 0.783±0.025 AUBC-ARI (similarity): 0.472±0.024 |
| incremental-constrained-clustering-on-iris | PCK-Means+Random | AUBC-ARI (quality): 0.695±0.018 AUBC-ARI (similarity): 0.271±0.008 |
| incremental-constrained-clustering-on-iris | IAC+Random | AUBC-ARI (quality): 0.816±0.014 AUBC-ARI (similarity): 0.605±0.016 |
| incremental-constrained-clustering-on-iris | PCK-Means+NPU | AUBC-ARI (quality): 0.876±0.018 AUBC-ARI (similarity): 0.398±0.029 |
| incremental-constrained-clustering-on-iris | IAC+NPU | AUBC-ARI (quality): 0.941±0.007 AUBC-ARI (similarity): 0.668±0.02 |
| incremental-constrained-clustering-on-iris | COP-KMeans+NPU | AUBC-ARI (quality): 0.88±0.016 AUBC-ARI (similarity): 0.432±0.029 |
| incremental-constrained-clustering-on-iris | MPCK-Means+NPU | AUBC-ARI (quality): 0.928±0.015 AUBC-ARI (similarity): 0.584±0.027 |
| incremental-constrained-clustering-on-wine | PCK-Means+NPU | AUBC-ARI (quality): 0.472±0.017 AUBC-ARI (similarity): 0.337±0.011 |
| incremental-constrained-clustering-on-wine | MPCK-Means+NPU | AUBC-ARI (quality): 0.893±0.016 AUBC-ARI (similarity): 0.817±0.002 |
| incremental-constrained-clustering-on-wine | COP-KMeans+Random | AUBC-ARI (quality): 0.369±0.003 AUBC-ARI (similarity): 0.241±0.008 |
| incremental-constrained-clustering-on-wine | IAC+Random | AUBC-ARI (quality): 0.349±0.01 AUBC-ARI (similarity): 0.441±0.012 |
| incremental-constrained-clustering-on-wine | PCK-Means+Random | AUBC-ARI (quality): 0.371±0.003 AUBC-ARI (similarity): 0.332±0.01 |
| incremental-constrained-clustering-on-wine | MPCK-Means+Random | AUBC-ARI (quality): 0.821±0.005 AUBC-ARI (similarity): 0.845±0.006 |
| incremental-constrained-clustering-on-wine | IAC+NPU | AUBC-ARI (quality): 0.481±0.016 AUBC-ARI (similarity): 0.455±0.09 |
| incremental-constrained-clustering-on-wine | COP-KMeans+NPU | AUBC-ARI (quality): 0.469±0.019 AUBC-ARI (similarity): 0.340±0.001 |