4 months ago

PyTorch-BigGraph: A Large-scale Graph Embedding System

Adam Lerer; Ledell Wu; Jiajun Shen; Timothee Lacroix; Luca Wehrstedt; Abhijit Bose; Alex Peysakhovich

Abstract

Graph embedding methods produce unsupervised node features from graphs that can then be used for a variety of machine learning tasks. Modern graphs, particularly in industrial applications, contain billions of nodes and trillions of edges, which exceeds the capability of existing embedding systems. We present PyTorch-BigGraph (PBG), an embedding system that incorporates several modifications to traditional multi-relation embedding systems that allow it to scale to graphs with billions of nodes and trillions of edges. PBG uses graph partitioning to train arbitrarily large embeddings on either a single machine or in a distributed environment. We demonstrate comparable performance with existing embedding systems on common benchmarks, while allowing for scaling to arbitrarily large graphs and parallelization on multiple machines. We train and evaluate embeddings on several large social network graphs as well as the full Freebase dataset, which contains over 100 million nodes and 2 billion edges.

Code Repositories

facebookresearch/PyTorch-BigGraph

pytorch

Benchmarks

Benchmark	Methodology	Metrics
link-prediction-on-fb15k-1	PyTorch BigGraph (ComplEx)	Hits@10: 0.872 MRR: 0.79 MRR raw: 0.242
link-prediction-on-livejournal	PBG (1 partition)	Hits@10: 0.857 MR: 245.9
link-prediction-on-livejournal	PyTorch BigGraph	MRR: 0.749
link-prediction-on-youtube	PyTorch BigGraph	Macro F1: 40.9 Micro F1: 48

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started

Hyper Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Command Palette