HyperAIHyperAI

Command Palette

Search for a command to run...

5 months ago

Global Self-Attention as a Replacement for Graph Convolution

Md Shamim Hussain; Mohammed J. Zaki; Dharmashankar Subramanian

Global Self-Attention as a Replacement for Graph Convolution

Abstract

We propose an extension to the transformer neural network architecture for general-purpose graph learning by adding a dedicated pathway for pairwise structural information, called edge channels. The resultant framework - which we call Edge-augmented Graph Transformer (EGT) - can directly accept, process and output structural information of arbitrary form, which is important for effective learning on graph-structured data. Our model exclusively uses global self-attention as an aggregation mechanism rather than static localized convolutional aggregation. This allows for unconstrained long-range dynamic interactions between nodes. Moreover, the edge channels allow the structural information to evolve from layer to layer, and prediction tasks on edges/links can be performed directly from the output embeddings of these channels. We verify the performance of EGT in a wide range of graph-learning experiments on benchmark datasets, in which it outperforms Convolutional/Message-Passing Graph Neural Networks. EGT sets a new state-of-the-art for the quantum-chemical regression task on the OGB-LSC PCQM4Mv2 dataset containing 3.8 million molecular graphs. Our findings indicate that global self-attention based aggregation can serve as a flexible, adaptive and effective replacement of graph convolution for general-purpose graph learning. Therefore, convolutional local neighborhood aggregation is not an essential inductive bias.

Code Repositories

shamim-hussain/egt_triangular
Official
pytorch
Mentioned in GitHub
shamim-hussain/egt_pytorch
Official
pytorch
Mentioned in GitHub
shamim-hussain/egt
Official
tf
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
graph-classification-on-cifar10-100kEGT
Accuracy (%): 68.702
graph-classification-on-mnistEGT
Accuracy: 98.173
graph-property-prediction-on-ogbg-molhivEGT
Test ROC-AUC: 0.806 ± 0.0065
graph-property-prediction-on-ogbg-molpcbaEGT
Test AP: 0.2961 ± 0.0024
graph-regression-on-pcqm4m-lscEGT
Validation MAE: 0.1224
graph-regression-on-pcqm4mv2-lscEGT
Test MAE: 0.0862
Validation MAE: 0.0857
graph-regression-on-pcqm4mv2-lscEGT + Triangular Attention
Test MAE: 0.0683
Validation MAE: 0.0671
graph-regression-on-zinc-100kEGT
MAE: 0.143
graph-regression-on-zinc-500kEGT
MAE: 0.108
link-prediction-on-tsp-hcp-benchmark-setEGT
F1: 0.853
node-classification-on-clusterEGT
Accuracy: 79.232
node-classification-on-patternEGT
Accuracy: 86.821
node-classification-on-pattern-100kEGT
Accuracy (%): 86.816

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Global Self-Attention as a Replacement for Graph Convolution | Papers | HyperAI