Command Palette
Search for a command to run...
David Buterez; Jon Paul Janet; Dino Oglic; Pietro Lio

Abstract
There has been a recent surge in transformer-based architectures for learning on graphs, mainly motivated by attention as an effective learning mechanism and the desire to supersede handcrafted operators characteristic of message passing schemes. However, concerns over their empirical effectiveness, scalability, and complexity of the pre-processing steps have been raised, especially in relation to much simpler graph neural networks that typically perform on par with them across a wide range of benchmarks. To tackle these shortcomings, we consider graphs as sets of edges and propose a purely attention-based approach consisting of an encoder and an attention pooling mechanism. The encoder vertically interleaves masked and vanilla self-attention modules to learn an effective representations of edges, while allowing for tackling possible misspecifications in input graphs. Despite its simplicity, the approach outperforms fine-tuned message passing baselines and recently proposed transformer-based methods on more than 70 node and graph-level tasks, including challenging long-range benchmarks. Moreover, we demonstrate state-of-the-art performance across different tasks, ranging from molecular to vision graphs, and heterophilous node classification. The approach also outperforms graph neural networks and transformers in transfer learning settings, and scales much better than alternatives with a similar performance level or expressive power.
Code Repositories
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| graph-classification-on-cifar10-100k | ESA (Edge set attention, no positional encodings) | Accuracy (%): 75.413±0.248 |
| graph-classification-on-dd | ESA (Edge set attention, no positional encodings) | Accuracy: 83.529±1.743 |
| graph-classification-on-enzymes | ESA (Edge set attention, no positional encodings) | Accuracy: 79.423±1.658 |
| graph-classification-on-imdb-b | ESA (Edge set attention, no positional encodings) | Accuracy: 86.250±0.957 |
| graph-classification-on-malnet-tiny | ESA (Edge set attention, no positional encodings) | Accuracy: 94.800±0.424 MCC: 0.935±0.005 |
| graph-classification-on-mnist | ESA (Edge set attention, no positional encodings) | Accuracy: 98.753±0.041 |
| graph-classification-on-mnist | ESA (Edge set attention, no positional encodings, tuned) | Accuracy: 98.917±0.020 |
| graph-classification-on-nci1 | ESA (Edge set attention, no positional encodings) | Accuracy: 87.835±0.644 |
| graph-classification-on-nci109 | ESA (Edge set attention, no positional encodings) | Accuracy: 84.976±0.551 |
| graph-classification-on-peptides-func | ESA (Edge set attention, no positional encodings, not tuned) | AP: 0.6863±0.0044 |
| graph-classification-on-peptides-func | ESA (Edge set attention, no positional encodings, tuned) | AP: 0.7071±0.0015 |
| graph-classification-on-peptides-func | ESA + RWSE (Edge set attention, Random Walk Structural Encoding, tuned) | AP: 0.7357±0.0036 |
| graph-classification-on-peptides-func | ESA + RWSE (Edge set attention, Random Walk Structural Encoding, + validation set) | AP: 0.7479 |
| graph-classification-on-proteins | ESA (Edge set attention, no positional encodings) | Accuracy: 82.679±0.799 |
| graph-regression-on-esr2 | ESA (Edge set attention, no positional encodings) | R2: 0.697±0.000 RMSE: 0.486±0.697 |
| graph-regression-on-f2 | ESA (Edge set attention, no positional encodings) | R2: 0.891±0.000 RMSE: 0.335±0.891 |
| graph-regression-on-kit | ESA (Edge set attention, no positional encodings) | R2: 0.841±0.000 RMSE: 0.433±0.841 |
| graph-regression-on-lipophilicity | ESA (Edge set attention, no positional encodings) | R2: 0.809±0.008 RMSE: 0.552±0.012 |
| graph-regression-on-parp1 | ESA (Edge set attention, no positional encodings) | R2: 0.925±0.000 RMSE: 0.343±0.925 |
| graph-regression-on-pcqm4mv2-lsc | ESA (Edge set attention, no positional encodings) | Test MAE: N/A Validation MAE: 0.0235 |
| graph-regression-on-peptides-struct | ESA + RWSE (Edge set attention, Random Walk Structural Encoding, tuned) | MAE: 0.2393±0.0004 |
| graph-regression-on-peptides-struct | ESA (Edge set attention, no positional encodings, not tuned) | MAE: 0.2453±0.0003 |
| graph-regression-on-pgr | ESA (Edge set attention, no positional encodings) | R2: 0.725±0.000 RMSE: 0.507±0.725 |
| graph-regression-on-zinc | ESA + rings + NodeRWSE + EdgeRWSE | MAE: 0.051 |
| graph-regression-on-zinc-500k | ESA + rings + NodeRWSE + EdgeRWSE | MAE: 0.051 |
| graph-regression-on-zinc-full | ESA + rings + NodeRWSE + EdgeRWSE | Test MAE: 0.0109±0.0002 |
| graph-regression-on-zinc-full | ESA + RWSE (Edge set attention, Random Walk Structural Encoding, tuned) | Test MAE: 0.0154±0.0001 |
| graph-regression-on-zinc-full | ESA + RWSE (Edge set attention, Random Walk Structural Encoding) | Test MAE: 0.017±0.001 |
| graph-regression-on-zinc-full | ESA + RWSE + CY2C (Edge set attention, Random Walk Structural Encoding, clique adjacency, tuned) | Test MAE: 0.0122±0.0004 |
| graph-regression-on-zinc-full | ESA (Edge set attention, no positional encodings) | Test MAE: 0.027±0.001 |
| molecular-property-prediction-on-esol | ESA (Edge set attention, no positional encodings) | R2: 0.944±0.002 RMSE: 0.485±0.009 |
| molecular-property-prediction-on-freesolv | ESA (Edge set attention, no positional encodings) | R2: 0.977±0.001 RMSE: 0.595±0.013 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.