5 months ago

RoBERTa: A Robustly Optimized BERT Pretraining Approach

Yinhan Liu; Myle Ott; Naman Goyal; Jingfei Du; Mandar Joshi; Danqi Chen; Omer Levy; Mike Lewis; Luke Zettlemoyer; Veselin Stoyanov

Abstract

Language model pretraining has led to significant performance gains but careful comparison between different approaches is challenging. Training is computationally expensive, often done on private datasets of different sizes, and, as we will show, hyperparameter choices have significant impact on the final results. We present a replication study of BERT pretraining (Devlin et al., 2019) that carefully measures the impact of many key hyperparameters and training data size. We find that BERT was significantly undertrained, and can match or exceed the performance of every model published after it. Our best model achieves state-of-the-art results on GLUE, RACE and SQuAD. These results highlight the importance of previously overlooked design choices, and raise questions about the source of recently reported improvements. We release our models and code.

Code Repositories

hkuds/easyrec

pytorch

Mentioned in GitHub

SindhuMadi/FakeNewsDetection

Mentioned in GitHub

expertailab/spaceqa

pytorch

Mentioned in GitHub

Karthik-Bhaskar/Context-Based-Question-Answering

Mentioned in GitHub

lvyufeng/bert4ms

mindspore

awslabs/mlm-scoring

mxnet

Mentioned in GitHub

haisongzhang/roberta-tiny-cased

Mentioned in GitHub

10Exahertz/Text-ResNet-on-Sentimental-LIAR-Fake-News

Mentioned in GitHub

common-english/bert-all

pytorch

Mentioned in GitHub

obi-ml-public/ehr_deidentification

Mentioned in GitHub

pytorch/fairseq

Official

pytorch

benywon/ReCO

pytorch

Mentioned in GitHub

NathanDuran/Sentence-Encoding-for-DA-Classification

Mentioned in GitHub

bluejurand/Kaggle_QA_Google_Labeling

Mentioned in GitHub

UnknownGenie/altered-BERT-KPE

pytorch

Mentioned in GitHub

pwc-1/Paper-9/tree/main/5/xlm_roberta

mindspore

knuddj1/op_text

pytorch

Mentioned in GitHub

xiaoqian19940510/text-classification-surveys

pytorch

Mentioned in GitHub

znhy1024/protoco

pytorch

Mentioned in GitHub

CalumPerrio/WNUT-2020

pytorch

Mentioned in GitHub

zfj1998/CodeBert-Code2Text

pytorch

Mentioned in GitHub

simon-benigeri/narrative-generation

pytorch

Mentioned in GitHub

dig-team/hanna-benchmark-asg

pytorch

Mentioned in GitHub

G-4-R-Y/Tweet-Sentiment-Extraction-roBERTa-5fold

Mentioned in GitHub

flexible-fl/flex-nlp

Mentioned in GitHub

tighu20/Kaggle-Tweet-Sentiment-Extraction

Mentioned in GitHub

salesforce/codet5

pytorch

Mentioned in GitHub

ricaelum42/Contextual-Twitter-Sarcasm-Detection

pytorch

Mentioned in GitHub

musixmatchresearch/umberto

pytorch

Mentioned in GitHub

facebookresearch/anli

pytorch

Mentioned in GitHub

GeorgeLuImmortal/Hierarchical-BERT-Model-with-Limited-Labelled-Data

pytorch

Mentioned in GitHub

nguyenvulebinh/vietnamese-roberta

pytorch

Mentioned in GitHub

viethoang1512/kpa

pytorch

Mentioned in GitHub

knuddy/op_text

pytorch

Mentioned in GitHub

wzzzd/LM_NER

pytorch

Mentioned in GitHub

sdadas/polish-roberta

pytorch

Mentioned in GitHub

duanchi1230/NLP_Project_AI2_Reasoning_Challenge

pytorch

Mentioned in GitHub

Tencent/TurboTransformers

pytorch

Mentioned in GitHub

abdumaa/hiqualprop

pytorch

Mentioned in GitHub

devhemza/BERTweet_sentiment_analysis

pytorch

Mentioned in GitHub

eternityyw/tram-benchmark

Mentioned in GitHub

huggingface/transformers

pytorch

Mentioned in GitHub

abhishekanand1710/noiseandbias

Mentioned in GitHub

PaddlePaddle/PaddleNLP/blob/develop/paddlenlp/transformers/roberta/modeling.py

paddle

oneflow-inc/libai

Mentioned in GitHub

bfopengradient/NLP_ROBERTA

Mentioned in GitHub

clovaai/textual-kd-slu

pytorch

Mentioned in GitHub

yangyucheng000/University/tree/main/model-3/roberta

mindspore

xiaoqian19940510/text-classification-

pytorch

Mentioned in GitHub

aistairc/kirt_bert_on_abci

pytorch

Mentioned in GitHub

2023-MindSpore-1/ms-code-163

mindspore

pisalore/roberta_results

pytorch

Mentioned in GitHub

bcaitech1/p2-klue-Heeseok-Jeong

pytorch

Mentioned in GitHub

G-4-R-Y/Tweet-Sentiment-Extraction

Mentioned in GitHub

mthcom/hscore-dataset-pruning

pytorch

Mentioned in GitHub

MS-P3/code7/tree/main/xlm_roberta_xl

mindspore

lashoun/hanna-benchmark-asg

pytorch

Mentioned in GitHub

kaushaltrivedi/fast-bert

pytorch

Mentioned in GitHub

octanove/shiba

pytorch

Mentioned in GitHub

traviscoan/cards

Mentioned in GitHub

utterworks/fast-bert

pytorch

Mentioned in GitHub

zaradana/Fast_BERT

pytorch

Mentioned in GitHub

IndicoDataSolutions/finetune

Mentioned in GitHub

brightmart/roberta_zh

Mentioned in GitHub

few-shot-NER-benchmark/BaselineCode

pytorch

Mentioned in GitHub

ibm/vira-intent-discovery

Mentioned in GitHub

blawok/named-entity-recognition

pytorch

Mentioned in GitHub

Benchmarks

Benchmark	Methodology	Metrics
common-sense-reasoning-on-commonsenseqa	RoBERTa-Large 355M	Accuracy: 72.1
common-sense-reasoning-on-swag	RoBERTa	Test: 89.9
document-image-classification-on-rvl-cdip	Roberta base	Accuracy: 90.06 Parameters: 125M
linguistic-acceptability-on-cola	RoBERTa (ensemble)	Accuracy: 67.8%
multi-task-language-understanding-on-mmlu	RoBERTa-base 125M (fine-tuned)	Average (%): 27.9
natural-language-inference-on-anli-test	RoBERTa (Large)	A1: 72.4 A2: 49.8 A3: 44.4
natural-language-inference-on-multinli	RoBERTa	Matched: 90.8
natural-language-inference-on-multinli	RoBERTa (ensemble)	Mismatched: 90.2
natural-language-inference-on-qnli	RoBERTa (ensemble)	Accuracy: 98.9%
natural-language-inference-on-rte	RoBERTa	Accuracy: 88.2%
natural-language-inference-on-rte	RoBERTa (ensemble)	Accuracy: 88.2%
natural-language-inference-on-wnli	RoBERTa (ensemble)	Accuracy: 89
question-answering-on-piqa	RoBERTa-Large 355M	Accuracy: 79.4
question-answering-on-quora-question-pairs	RoBERTa (ensemble)	Accuracy: 90.2%
question-answering-on-social-iqa	RoBERTa-Large 355M (fine-tuned)	Accuracy: 76.7
question-answering-on-squad20	RoBERTa (single model)	EM: 86.820 F1: 89.795
question-answering-on-squad20-dev	RoBERTa (no data aug)	EM: 86.5 F1: 89.4
reading-comprehension-on-race	RoBERTa	Accuracy: 83.2 Accuracy (High): 81.3 Accuracy (Middle): 86.5
semantic-textual-similarity-on-mrpc	RoBERTa (ensemble)	Accuracy: 92.3%
semantic-textual-similarity-on-sts-benchmark	RoBERTa	Pearson Correlation: 0.922
sentiment-analysis-on-sst-2-binary	RoBERTa (ensemble)	Accuracy: 96.7
stock-market-prediction-on-astock	RoBERTa WWM Ext (News+Factors)	Accuray: 62.49 F1-score: 62.54 Precision: 62.59 Recall: 62.51
stock-market-prediction-on-astock	RoBERTa WWM Ext (News)	Accuray: 61.34 F1-score: 61.48 Precision: 61.97 Recall: 61.32
task-1-grouping-on-ocw	RoBERTa (LARGE)	# Correct Groups: 29 ± 3 # Solved Walls: 0 ± 0 Adjusted Mutual Information (AMI): 9.4 ± .4 Adjusted Rand Index (ARI): 8.4 ± .3 Fowlkes Mallows Score (FMS): 26.7 ± .2 Wasserstein Distance (WD): 88.4 ± .4
text-classification-on-arxiv-10	RoBERTa	Accuracy: 0.779
type-prediction-on-manytypes4typescript	RoBERTa	Average Accuracy: 59.84 Average F1: 57.54 Average Precision: 57.45 Average Recall: 57.62

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started

Hyper Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Command Palette

RoBERTa: A Robustly Optimized BERT Pretraining Approach

Yinhan Liu; Myle Ott; Naman Goyal; Jingfei Du; Mandar Joshi; Danqi Chen; Omer Levy; Mike Lewis; Luke Zettlemoyer; Veselin Stoyanov

Abstract

Code Repositories

Benchmarks

Build AI with AI

Hyper Newsletters