HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

OGB-LSC: A Large-Scale Challenge for Machine Learning on Graphs

Weihua Hu Matthias Fey Hongyu Ren Maho Nakata Yuxiao Dong Jure Leskovec

OGB-LSC: A Large-Scale Challenge for Machine Learning on Graphs

Abstract

Enabling effective and efficient machine learning (ML) over large-scale graph data (e.g., graphs with billions of edges) can have a great impact on both industrial and scientific applications. However, existing efforts to advance large-scale graph ML have been largely limited by the lack of a suitable public benchmark. Here we present OGB Large-Scale Challenge (OGB-LSC), a collection of three real-world datasets for facilitating the advancements in large-scale graph ML. The OGB-LSC datasets are orders of magnitude larger than existing ones, covering three core graph learning tasks -- link prediction, graph regression, and node classification. Furthermore, we provide dedicated baseline experiments, scaling up expressive graph ML models to the massive datasets. We show that expressive models significantly outperform simple scalable baselines, indicating an opportunity for dedicated efforts to further improve graph ML at scale. Moreover, OGB-LSC datasets were deployed at ACM KDD Cup 2021 and attracted more than 500 team registrations globally, during which significant performance improvements were made by a variety of innovative techniques. We summarize the common techniques used by the winning solutions and highlight the current best practices in large-scale graph ML. Finally, we describe how we have updated the datasets after the KDD Cup to further facilitate research advances. The OGB-LSC datasets, baseline code, and all the information about the KDD Cup are available at https://ogb.stanford.edu/docs/lsc/ .

Code Repositories

graphcore/ogb-lsc-pcqm4mv2
tf
Mentioned in GitHub
graphcore/distributed-kge-poplar
pytorch
Mentioned in GitHub
snap-stanford/ogb
Official
pytorch
shamim-hussain/egt
tf
Mentioned in GitHub
lars-research/3d-pgt
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
graph-regression-on-pcqm4m-lscMLP-fingerprint
Test MAE: 20.68
Validation MAE: 0.2044
graph-regression-on-pcqm4m-lscGCN-Virtual
Test MAE: 15.79
Validation MAE: 0.1536
graph-regression-on-pcqm4m-lscGCN
Test MAE: 18.38
Validation MAE: 0.1684
graph-regression-on-pcqm4m-lscGIN-virtual
Test MAE: 14.87
Validation MAE: 0.1396
graph-regression-on-pcqm4m-lscGIN
Test MAE: 16.78
graph-regression-on-pcqm4mv2-lscMLP-Fingerprint
Test MAE: 0.1760
Validation MAE: 0.1753
knowledge-graphs-on-wikikg90m-lscTransE-RoBERTa
Test MRR: 0.6288
Validation MRR: 0.6039
knowledge-graphs-on-wikikg90m-lscComplEx-Concat
Test MRR: 0.8637
Validation MRR: 0.8425
knowledge-graphs-on-wikikg90m-lscTransE-Concat
Test MRR: 85.48
Validation MRR: 0.8494
knowledge-graphs-on-wikikg90m-lscComplEx-RoBERTa
Test MRR: 0.7186
Validation MRR: 0.7052
node-classification-on-mag240m-lscR-GraphSAGE (NS)
Test Accuracy: 68.94
node-classification-on-mag240m-lscSIGN
Test Accuracy: 66.09
Validation Accuracy: 66.64
node-classification-on-mag240m-lscGAT (NS)
Test Accuracy: 66.63
node-classification-on-mag240m-lscGraphSAGE (NS)
Test Accuracy: 66.25

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
OGB-LSC: A Large-Scale Challenge for Machine Learning on Graphs | Papers | HyperAI