4 个月前

RoBERTa：一种稳健优化的BERT预训练方法

Yinhan Liu; Myle Ott; Naman Goyal; Jingfei Du; Mandar Joshi; Danqi Chen; Omer Levy; Mike Lewis; Luke Zettlemoyer; Veselin Stoyanov

摘要

语言模型预训练已带来显著的性能提升，但不同方法之间的仔细比较颇具挑战性。训练过程计算成本高昂，通常在不同规模的私有数据集上进行，正如我们将展示的那样，超参数选择对最终结果有着重大影响。本文对BERT预训练（Devlin等人，2019年）进行了复制研究，仔细测量了多个关键超参数和训练数据量的影响。我们发现，BERT的训练明显不足，且其性能可以匹敌甚至超过所有在其之后发布的模型。我们的最佳模型在GLUE、RACE和SQuAD基准测试中取得了最先进的结果。这些结果突显了先前被忽视的设计选择的重要性，并对近期报告的改进来源提出了质疑。我们发布了我们的模型和代码。

代码仓库

hkuds/easyrec

pytorch

GitHub 中提及

SindhuMadi/FakeNewsDetection

GitHub 中提及

expertailab/spaceqa

pytorch

GitHub 中提及

Karthik-Bhaskar/Context-Based-Question-Answering

GitHub 中提及

lvyufeng/bert4ms

mindspore

awslabs/mlm-scoring

mxnet

GitHub 中提及

haisongzhang/roberta-tiny-cased

GitHub 中提及

10Exahertz/Text-ResNet-on-Sentimental-LIAR-Fake-News

GitHub 中提及

common-english/bert-all

pytorch

GitHub 中提及

obi-ml-public/ehr_deidentification

GitHub 中提及

pytorch/fairseq

官方

pytorch

benywon/ReCO

pytorch

GitHub 中提及

NathanDuran/Sentence-Encoding-for-DA-Classification

GitHub 中提及

bluejurand/Kaggle_QA_Google_Labeling

GitHub 中提及

UnknownGenie/altered-BERT-KPE

pytorch

GitHub 中提及

pwc-1/Paper-9/tree/main/5/xlm_roberta

mindspore

knuddj1/op_text

pytorch

GitHub 中提及

xiaoqian19940510/text-classification-surveys

pytorch

GitHub 中提及

znhy1024/protoco

pytorch

GitHub 中提及

CalumPerrio/WNUT-2020

pytorch

GitHub 中提及

zfj1998/CodeBert-Code2Text

pytorch

GitHub 中提及

simon-benigeri/narrative-generation

pytorch

GitHub 中提及

dig-team/hanna-benchmark-asg

pytorch

GitHub 中提及

G-4-R-Y/Tweet-Sentiment-Extraction-roBERTa-5fold

GitHub 中提及

flexible-fl/flex-nlp

GitHub 中提及

tighu20/Kaggle-Tweet-Sentiment-Extraction

GitHub 中提及

salesforce/codet5

pytorch

GitHub 中提及

ricaelum42/Contextual-Twitter-Sarcasm-Detection

pytorch

GitHub 中提及

musixmatchresearch/umberto

pytorch

GitHub 中提及

facebookresearch/anli

pytorch

GitHub 中提及

GeorgeLuImmortal/Hierarchical-BERT-Model-with-Limited-Labelled-Data

pytorch

GitHub 中提及

nguyenvulebinh/vietnamese-roberta

pytorch

GitHub 中提及

viethoang1512/kpa

pytorch

GitHub 中提及

knuddy/op_text

pytorch

GitHub 中提及

wzzzd/LM_NER

pytorch

GitHub 中提及

sdadas/polish-roberta

pytorch

GitHub 中提及

duanchi1230/NLP_Project_AI2_Reasoning_Challenge

pytorch

GitHub 中提及

Tencent/TurboTransformers

pytorch

GitHub 中提及

abdumaa/hiqualprop

pytorch

GitHub 中提及

devhemza/BERTweet_sentiment_analysis

pytorch

GitHub 中提及

eternityyw/tram-benchmark

GitHub 中提及

huggingface/transformers

pytorch

GitHub 中提及

abhishekanand1710/noiseandbias

GitHub 中提及

PaddlePaddle/PaddleNLP/blob/develop/paddlenlp/transformers/roberta/modeling.py

paddle

oneflow-inc/libai

GitHub 中提及

bfopengradient/NLP_ROBERTA

GitHub 中提及

clovaai/textual-kd-slu

pytorch

GitHub 中提及

yangyucheng000/University/tree/main/model-3/roberta

mindspore

xiaoqian19940510/text-classification-

pytorch

GitHub 中提及

aistairc/kirt_bert_on_abci

pytorch

GitHub 中提及

2023-MindSpore-1/ms-code-163

mindspore

pisalore/roberta_results

pytorch

GitHub 中提及

bcaitech1/p2-klue-Heeseok-Jeong

pytorch

GitHub 中提及

G-4-R-Y/Tweet-Sentiment-Extraction

GitHub 中提及

mthcom/hscore-dataset-pruning

pytorch

GitHub 中提及

MS-P3/code7/tree/main/xlm_roberta_xl

mindspore

lashoun/hanna-benchmark-asg

pytorch

GitHub 中提及

kaushaltrivedi/fast-bert

pytorch

GitHub 中提及

octanove/shiba

pytorch

GitHub 中提及

traviscoan/cards

GitHub 中提及

utterworks/fast-bert

pytorch

GitHub 中提及

zaradana/Fast_BERT

pytorch

GitHub 中提及

IndicoDataSolutions/finetune

GitHub 中提及

brightmart/roberta_zh

GitHub 中提及

few-shot-NER-benchmark/BaselineCode

pytorch

GitHub 中提及

ibm/vira-intent-discovery

GitHub 中提及

blawok/named-entity-recognition

pytorch

GitHub 中提及

基准测试

基准	方法	指标
common-sense-reasoning-on-commonsenseqa	RoBERTa-Large 355M	Accuracy: 72.1
common-sense-reasoning-on-swag	RoBERTa	Test: 89.9
document-image-classification-on-rvl-cdip	Roberta base	Accuracy: 90.06 Parameters: 125M
linguistic-acceptability-on-cola	RoBERTa (ensemble)	Accuracy: 67.8%
multi-task-language-understanding-on-mmlu	RoBERTa-base 125M (fine-tuned)	Average (%): 27.9
natural-language-inference-on-anli-test	RoBERTa (Large)	A1: 72.4 A2: 49.8 A3: 44.4
natural-language-inference-on-multinli	RoBERTa	Matched: 90.8
natural-language-inference-on-multinli	RoBERTa (ensemble)	Mismatched: 90.2
natural-language-inference-on-qnli	RoBERTa (ensemble)	Accuracy: 98.9%
natural-language-inference-on-rte	RoBERTa	Accuracy: 88.2%
natural-language-inference-on-rte	RoBERTa (ensemble)	Accuracy: 88.2%
natural-language-inference-on-wnli	RoBERTa (ensemble)	Accuracy: 89
question-answering-on-piqa	RoBERTa-Large 355M	Accuracy: 79.4
question-answering-on-quora-question-pairs	RoBERTa (ensemble)	Accuracy: 90.2%
question-answering-on-social-iqa	RoBERTa-Large 355M (fine-tuned)	Accuracy: 76.7
question-answering-on-squad20	RoBERTa (single model)	EM: 86.820 F1: 89.795
question-answering-on-squad20-dev	RoBERTa (no data aug)	EM: 86.5 F1: 89.4
reading-comprehension-on-race	RoBERTa	Accuracy: 83.2 Accuracy (High): 81.3 Accuracy (Middle): 86.5
semantic-textual-similarity-on-mrpc	RoBERTa (ensemble)	Accuracy: 92.3%
semantic-textual-similarity-on-sts-benchmark	RoBERTa	Pearson Correlation: 0.922
sentiment-analysis-on-sst-2-binary	RoBERTa (ensemble)	Accuracy: 96.7
stock-market-prediction-on-astock	RoBERTa WWM Ext (News+Factors)	Accuray: 62.49 F1-score: 62.54 Precision: 62.59 Recall: 62.51
stock-market-prediction-on-astock	RoBERTa WWM Ext (News)	Accuray: 61.34 F1-score: 61.48 Precision: 61.97 Recall: 61.32
task-1-grouping-on-ocw	RoBERTa (LARGE)	# Correct Groups: 29 ± 3 # Solved Walls: 0 ± 0 Adjusted Mutual Information (AMI): 9.4 ± .4 Adjusted Rand Index (ARI): 8.4 ± .3 Fowlkes Mallows Score (FMS): 26.7 ± .2 Wasserstein Distance (WD): 88.4 ± .4
text-classification-on-arxiv-10	RoBERTa	Accuracy: 0.779
type-prediction-on-manytypes4typescript	RoBERTa	Average Accuracy: 59.84 Average F1: 57.54 Average Precision: 57.45 Average Recall: 57.62

用 AI 构建 AI

从想法到上线——通过免费 AI 协同编程、开箱即用的环境和市场最优价格的 GPU 加速您的 AI 开发

AI 协同编程

即用型 GPU

最优价格

立即开始

Hyper Newsletters

订阅我们的最新资讯

我们会在北京时间 每周一的上午九点 向您的邮箱投递本周内的最新更新

邮件发送服务由 MailChimp 提供