
摘要
近年来,预训练-微调范式在图神经网络(Graph Neural Networks, GNNs)中的研究与应用日益广泛,已被成功应用于多种图挖掘任务。该范式的成功通常归因于预训练数据集与下游任务数据集之间存在的结构一致性。然而,在许多现实场景中,这种结构一致性并不存在。已有研究表明,预训练图与下游图之间的结构差异会显著限制传统微调策略的可迁移性。这种结构差异导致模型在预训练图上出现过拟合,同时难以有效捕捉下游图的结构特性。本文识别出结构差异的根本原因在于预训练图与下游图之间生成模式(generative patterns)的不一致。为此,我们提出一种名为G-Tuning的新方法,旨在保留下游图的生成模式。给定一个下游图G,其核心思想是调整预训练的GNN,使其能够重构该图的图核(graphon)W,即描述图生成机制的连续对称函数。然而,图核的精确重构在计算上极为昂贵。为克服这一挑战,我们提供了理论分析,证明对于任意给定的图核,均存在一组称为“图核基”(graphon bases)的替代图核集合。通过这些图核基的线性组合,可以高效地逼近原始图核W。这一理论发现构成了我们所提方法的理论基础,使得图核基及其对应系数能够被有效学习。与现有方法相比,G-Tuning在域内(in-domain)和域外(out-of-domain)迁移学习实验中分别实现了平均0.5%和2.6%的性能提升,展现出显著的优越性。
代码仓库
zjunet/G-Tuning
官方
pytorch
GitHub 中提及
基准测试
| 基准 | 方法 | 指标 |
|---|---|---|
| graph-classification-on-bace | G-Tuning | ROC-AUC: 84.79 |
| graph-classification-on-bbbp | G-Tuning | ROC-AUC: 72.59 |
| graph-classification-on-clintox | G-Tuning | ROC-AUC: 74.64 |
| graph-classification-on-enzymes | G-Tuning | Accuracy (10 fold): 26.70 |
| graph-classification-on-hiv | G-Tuning | ROC-AUC: 77.33 |
| graph-classification-on-imdb-b | G-Tuning | Accuracy (10-fold): 74.30 |
| graph-classification-on-imdb-m | G-Tuning | Accuracy (10-fold): 51.80 |
| graph-classification-on-msrc-21-per-class | G-Tuning | Accuracy (10 fold): 11.01 |
| graph-classification-on-mutag | G-Tuning | Accuracy (10 fold): 86.14 |
| graph-classification-on-muv | G-Tuning | ROC-AUC: 75.84 |
| graph-classification-on-proteins | G-Tuning | Accuracy (10 fold): 72.05 |
| graph-classification-on-reddit-12k | G-Tuning | Accuracy (10 fold): 42.80 |
| graph-classification-on-sider | G-Tuning | ROC-AUC: 61.40 |
| graph-classification-on-tox21 | G-Tuning | ROC-AUC: 75.80 |
| graph-classification-on-toxcast | G-Tuning | ROC-AUC: 64.25 |