3 months ago

RealFormer: Transformer Likes Residual Attention

Ruining He Anirudh Ravula Bhargav Kanagal Joshua Ainslie

Abstract

Transformer is the backbone of modern NLP models. In this paper, we propose RealFormer, a simple and generic technique to create Residual Attention Layer Transformer networks that significantly outperform the canonical Transformer and its variants (BERT, ETC, etc.) on a wide spectrum of tasks including Masked Language Modeling, GLUE, SQuAD, Neural Machine Translation, WikiHop, HotpotQA, Natural Questions, and OpenKP. We also observe empirically that RealFormer stabilizes training and leads to models with sparser attention. Source code and pre-trained checkpoints for RealFormer can be found at https://github.com/google-research/google-research/tree/master/realformer.

Code Repositories

JunnYu/x-transformers-paddle

jax

Mentioned in GitHub

aivolcano/BERT_MRC_CLS

pytorch

Mentioned in GitHub

cloneofsimo/RealFormer-pytorch

pytorch

Mentioned in GitHub

google-research/google-research

Official

jaketae/realformer

pytorch

Mentioned in GitHub

Benchmarks

Benchmark	Methodology	Metrics
linguistic-acceptability-on-cola	RealFormer	Accuracy: 59.83%
natural-language-inference-on-multinli	RealFormer	Matched: 86.28 Mismatched: 86.34
natural-language-inference-on-qnli	RealFormer	Accuracy: 91.89%
natural-language-inference-on-rte	RealFormer	Accuracy: 73.7%
paraphrase-identification-on-quora-question	RealFormer	Accuracy: 91.34 F1: 88.28
semantic-textual-similarity-on-mrpc	RealFormer	Accuracy: 87.01% F1: 90.91%
semantic-textual-similarity-on-sts-benchmark	RealFormer	Pearson Correlation: 0.9011 Spearman Correlation: 0.8988
sentiment-analysis-on-sst-2-binary	RealFormer	Accuracy: 94.04

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started

Hyper Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Command Palette

RealFormer: Transformer Likes Residual Attention

Ruining He Anirudh Ravula Bhargav Kanagal Joshua Ainslie

Abstract

Code Repositories

Benchmarks

Build AI with AI

Hyper Newsletters