3 months ago

Reformer: The Efficient Transformer

Nikita Kitaev Łukasz Kaiser Anselm Levskaya

Abstract

Large Transformer models routinely achieve state-of-the-art results on a number of tasks but training these models can be prohibitively costly, especially on long sequences. We introduce two techniques to improve the efficiency of Transformers. For one, we replace dot-product attention by one that uses locality-sensitive hashing, changing its complexity from O($L^2$) to O($L\log L$), where $L$ is the length of the sequence. Furthermore, we use reversible residual layers instead of the standard residuals, which allows storing activations only once in the training process instead of $N$ times, where $N$ is the number of layers. The resulting model, the Reformer, performs on par with Transformer models while being much more memory-efficient and much faster on long sequences.

Code Repositories

lucashueda/long_sentence_transformer

pytorch

Mentioned in GitHub

MindCode-4/code-3/tree/main/efficientformer

mindspore

yangyucheng000/University/tree/main/model-3/reformer

mindspore

lucidrains/reformer-pytorch

pytorch

huggingface/transformers

pytorch

Mentioned in GitHub

junnyu/paddle_reformer

jax

sliao-mi-luku/NLP-Chatbot-Reformer-Trax

pytorch

Mentioned in GitHub

Rick-McCoy/Reformer-pytorch

pytorch

google/trax/tree/master/trax/models/reformer

Official

jax

Mentioned in GitHub

sliao-mi-luku/Chatbot-Reformer

Mentioned in GitHub

Benchmarks

Benchmark	Methodology	Metrics
d4rl-on-d4rl	Reformer	Average Reward: 63.9
image-generation-on-imagenet-64x64	Reformer (6 layers)	Bits per dim: 3.740
image-generation-on-imagenet-64x64	Reformer (12 layers)	Bits per dim: 3.710
language-modelling-on-wikitext-103	Reformer 125M	Test perplexity: 26.0
open-domain-question-answering-on-searchqa	Locality-Sensitive Hashing	EM: 66.0
question-answering-on-natural-questions-long	Locality-Sensitive Hashing	F1: 75.5
question-answering-on-quasart-t	Locality-Sensitive Hashing	EM: 53.2

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started

Hyper Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Command Palette

Reformer: The Efficient Transformer

Nikita Kitaev Łukasz Kaiser Anselm Levskaya

Abstract

Code Repositories

Benchmarks

Build AI with AI

Hyper Newsletters