
摘要
实验的可复现性与可重复性是机器学习领域中的关键议题。研究者们长期以来一直担忧科学论文中存在这一问题,亟需通过改进以提升该领域的整体研究质量。近年来,图表示学习(Graph Representation Learning)领域吸引了广泛的研究关注,催生了大量相关工作。在此背景下,多种图神经网络(Graph Neural Network, GNN)模型被提出,以有效应对图分类任务。然而,当前的实验流程往往缺乏严谨性,且难以复现。针对这一问题,我们系统梳理了在与当前最先进方法进行公平比较时应避免的常见实践。为扭转这一令人担忧的趋势,我们在一个受控且统一的框架下,共执行了超过47,000次实验,对五种主流图分类模型在九个常用基准数据集上进行了重新评估。此外,通过将GNN模型与不依赖图结构信息的基线方法进行对比,我们提供了有力证据表明:在某些数据集上,图的结构信息尚未被充分挖掘和利用。我们认为,本研究有助于推动图学习领域的发展,为图分类模型的严谨评估提供了迫切需要的基准依据。
代码仓库
toinesayan/node-classification-and-label-dependencies
pytorch
GitHub 中提及
diningphil/CGMM
pytorch
GitHub 中提及
diningphil/gnn-comparison
官方
pytorch
diningphil/icgmm
pytorch
GitHub 中提及
基准测试
| 基准 | 方法 | 指标 |
|---|---|---|
| graph-classification-on-collab | GraphSAGE | Accuracy: 73.9% |
| graph-classification-on-dd | DGCNN | Accuracy: 76.6% |
| graph-classification-on-enzymes | GraphSAGE | Accuracy: 58.2% |
| graph-classification-on-enzymes | GIN | Accuracy: 59.6% |
| graph-classification-on-imdb-b | GraphSAGE | Accuracy: 68.8% |
| graph-classification-on-imdb-m | GraphSAGE | Accuracy: 47.6% |
| graph-classification-on-nci1 | GIN | Accuracy: 80% |
| graph-classification-on-nci1 | DGCNN | Accuracy: 76.4% |
| graph-classification-on-proteins | GraphSAGE | Accuracy: 73% |
| graph-classification-on-proteins | DiffPool | Accuracy: 73.7% |
| graph-classification-on-reddit-b | GraphSAGE | Accuracy: 84.3 |
| graph-classification-on-reddit-multi-5k | GraphSAGE | Accuracy: 50 |