
摘要
我们提出了一种构建具有线性复杂度和在多种基准测试中取得最先进结果的通用、强大、可扩展(GPS)图Transformer的方法。图Transformer(GTs)在图表示学习领域受到了广泛关注,近期有许多相关研究发表,但它们缺乏关于什么是良好的位置编码或结构编码以及这些编码之间有何区别的共同基础。在本文中,我们对不同类型的编码进行了更清晰的定义,并将其分类为$\textit{局部}$、$\textit{全局}$或$\textit{相对}$编码。以往的图Transformer受限于节点数仅为几百的小规模图,而我们在此提出了首个复杂度与节点数和边数呈线性关系$O(N+E)$的架构,通过将局部真实边聚合与全连接Transformer解耦实现这一目标。我们认为这种解耦不会影响表达能力,我们的架构可以作为图上的通用函数逼近器。我们的GPS方法包括选择三个主要成分:(i) 位置/结构编码,(ii) 局部消息传递机制,以及 (iii) 全局注意力机制。我们提供了一个模块化的框架$\textit{GraphGPS}$,支持多种类型的编码,并且在小规模和大规模图中均能保证效率和可扩展性。我们在16个基准测试上对我们的架构进行了测试,并展示了在所有测试中的高度竞争力,证明了模块化和不同策略组合带来的实证优势。
代码仓库
graphcore/ogb-lsc-pcqm4mv2
tf
GitHub 中提及
rampasek/GraphGPS
官方
pytorch
GitHub 中提及
linusbao/MoSE
pytorch
GitHub 中提及
hamed1375/exphormer
pytorch
GitHub 中提及
基准测试
| 基准 | 方法 | 指标 |
|---|---|---|
| graph-classification-on-cifar10-100k | GPS | Accuracy (%): 72.298 |
| graph-classification-on-enzymes | GraphGPS | Accuracy: 78.667±4.625 |
| graph-classification-on-imdb-b | GraphGPS | Accuracy: 79.250±3.096 |
| graph-classification-on-malnet-tiny | GPS | Accuracy: 93.36 ± 0.6 |
| graph-classification-on-mnist | GPS | Accuracy: 98.05 |
| graph-classification-on-nci1 | GraphGPS | Accuracy: 85.110±1.423 |
| graph-classification-on-nci109 | GraphGPS | Accuracy: 81.256±0.501 |
| graph-classification-on-peptides-func | GPS | AP: 0.6535±0.0041 |
| graph-classification-on-proteins | GraphGPS | Accuracy: 77.143±1.494 |
| graph-property-prediction-on-ogbg-code2 | GPS | Ext. data: No Number of params: 12454066 Test F1 score: 0.1894 Validation F1 score: 0.1739 ± 0.001 |
| graph-property-prediction-on-ogbg-molhiv | GPS | Ext. data: No Number of params: 558625 Test ROC-AUC: 0.7880 Validation ROC-AUC: 0.8255 ± 0.0092 |
| graph-property-prediction-on-ogbg-molpcba | GPS | Ext. data: No Number of params: 9744496 Test AP: 0.2907 Validation AP: 0.3015 ± 0.0038 |
| graph-property-prediction-on-ogbg-ppa | GPS | Ext. data: No Number of params: 3434533 Test Accuracy: 0.8015 Validation Accuracy: 0.7556 ± 0.0027 |
| graph-regression-on-lipophilicity | GraphGPS | R2: 0.790±0.004 RMSE: 0.579±0.006 |
| graph-regression-on-pcqm4mv2-lsc | GPS | Test MAE: 0.0862 Validation MAE: 0.0852 |
| graph-regression-on-peptides-struct | GPS | MAE: 0.2500±0.0005 |
| graph-regression-on-zinc | GPS | MAE: 0.070 ± 0.002 |
| graph-regression-on-zinc | GINE | MAE: 0.070 ± 0.004 |
| graph-regression-on-zinc-500k | GPS | MAE: 0.070 |
| graph-regression-on-zinc-full | GraphGPS | Test MAE: 0.024±0.007 |
| link-prediction-on-pcqm-contact | GPS | MRR: 0.3337±0.0006 |
| molecular-property-prediction-on-esol | GraphGPS | R2: 0.911±0.003 RMSE: 0.613±0.010 |
| molecular-property-prediction-on-freesolv | GraphGPS | R2: 0.861±0.037 RMSE: 1.462±0.188 |
| node-classification-on-cluster | GPS | Accuracy: 77.95 |
| node-classification-on-coco-sp | GPS | macro F1: 0.3412±0.0044 |
| node-classification-on-pascalvoc-sp-1 | GPS | macro F1: 0.3748±0.0109 |
| node-classification-on-pattern | GPS | Accuracy: 86.685 |