4 个月前

探索统一文本到文本转换器的迁移学习极限

探索统一文本到文本转换器的迁移学习极限

摘要

迁移学习是一种强大的自然语言处理(NLP)技术,其中模型首先在一个数据丰富的任务上进行预训练,然后再针对下游任务进行微调。迁移学习的有效性催生了多种方法、技术和实践。本文通过引入一个统一框架,将所有基于文本的语言问题转化为文本到文本的格式,探讨了NLP中的迁移学习技术。我们的系统研究比较了数十个语言理解任务上的预训练目标、架构、无标签数据集、迁移方法以及其他因素。通过结合我们在探索过程中获得的见解以及我们新的“巨量干净爬取语料库”(Colossal Clean Crawled Corpus),我们在涵盖摘要生成、问答、文本分类等多个基准测试中取得了最先进的结果。为了促进未来在NLP迁移学习领域的研究,我们发布了我们的数据集、预训练模型和代码。

代码仓库

um-arm-lab/efficient-eng-2-ltl
pytorch
GitHub 中提及
s-nlp/russe_detox_2022
pytorch
GitHub 中提及
conceptofmind/lamda-rlhf-pytorch
pytorch
GitHub 中提及
amazon-science/chronos-forecasting
pytorch
GitHub 中提及
conceptofmind/LaMDA-pytorch
pytorch
GitHub 中提及
allenai/dolma
GitHub 中提及
jongwooko/nash-pruning-official
pytorch
GitHub 中提及
gulucaptain/dynamictrl
pytorch
GitHub 中提及
shivamraval98/multitask-t5_ae
pytorch
GitHub 中提及
JunnYu/x-transformers-paddle
jax
GitHub 中提及
vgaraujov/seq2seq-spanish-plms
pytorch
GitHub 中提及
qipengguo/p2_webnlg2020
pytorch
GitHub 中提及
volcengine/vegiantmodel
pytorch
GitHub 中提及
cccntu/ft5-demo-space
pytorch
GitHub 中提及
airc-keti/ke-t5
tf
GitHub 中提及
allenai/c4-documentation
GitHub 中提及
google-research/t5x
jax
GitHub 中提及
yizhongw/tk-instruct
pytorch
GitHub 中提及
Ki6an/fastT5
pytorch
GitHub 中提及
skoltech-nlp/russe_detox_2022
pytorch
GitHub 中提及
KAGUYAHONGLAI/SRC
tf
GitHub 中提及
facebookresearch/atlas
pytorch
GitHub 中提及
huggingface/transformers
pytorch
GitHub 中提及
zhiqic/chartreader
pytorch
GitHub 中提及
dawn0815/UniSA
pytorch
GitHub 中提及
xuetianci/pacit
pytorch
GitHub 中提及
Sharif-SLPL/t5-fa
jax
GitHub 中提及
thu-keg/omnievent
pytorch
GitHub 中提及
lesterpjy/numeric-t5
GitHub 中提及
abelriboulot/onnxt5
pytorch
GitHub 中提及
ArvinZhuang/BiTAG
pytorch
GitHub 中提及
thudm/swissarmytransformer
pytorch
GitHub 中提及
itzprashu1/prashant
tf
GitHub 中提及
junnyu/paddle_t5
paddle
GitHub 中提及
wangcongcong123/ttt
pytorch
GitHub 中提及
cccntu/ft5-demo
pytorch
GitHub 中提及
ibm/graph_ensemble_learning
pytorch
GitHub 中提及
google/seqio
tf
GitHub 中提及
asahi417/lmppl
GitHub 中提及

基准测试

基准方法指标
abstractive-text-summarization-on-cnn-dailyT5
ROUGE-1: 43.52
ROUGE-2: 21.55
ROUGE-L: 40.69
answer-generation-on-weibopollsT5
BLEU-1: 37.77
BLEU-3: 25.86
ROUGE-1: 46.20
ROUGE-L: 43.32
common-sense-reasoning-on-recordT5-XXL 11B (fine-tuned)
EM: 93.4
common-sense-reasoning-on-recordT5-11B
F1: 94.1
coreference-resolution-on-winograd-schemaT5-XXL 11B (fine-tuned)
Accuracy: 93.8
document-summarization-on-cnn-daily-mailT5-11B
ROUGE-1: 43.52
ROUGE-2: 21.55
ROUGE-L: 40.69
linguistic-acceptability-on-colaT5-Base
Accuracy: 51.1%
linguistic-acceptability-on-colaT5-XL 3B
Accuracy: 67.1%
linguistic-acceptability-on-colaT5-Small
Accuracy: 41.0%
linguistic-acceptability-on-colaT5-Large 770M
Accuracy: 61.2%
linguistic-acceptability-on-colaT5-11B
Accuracy: 70.8%
machine-translation-on-wmt2014-english-frenchT5
BLEU score: 43.4
machine-translation-on-wmt2014-english-germanT5-11B
BLEU score: 32.1
Number of Params: 11110M
multimodal-intent-recognition-on-photochatT5-3B
F1: 58.9
Precision: 54.1
Recall: 64.6
multimodal-intent-recognition-on-photochatT5-base
F1: 58.1
Precision: 58.2
Recall: 57.9
natural-language-inference-on-commitmentbankT5-XXL 11B (fine-tuned)
Accuracy: 96.8
F1: 93.9
natural-language-inference-on-commitmentbankT5-Large 770M (fine-tuned)
Accuracy: 94.4
F1: 90.3
natural-language-inference-on-commitmentbankT5-Base 220M (fine-tuned)
Accuracy: 94
F1: 86.2
natural-language-inference-on-multinliT5-Base
Matched: 87.1
Mismatched: 86.2
natural-language-inference-on-multinliT5-3B
Matched: 91.4
Mismatched: 91.2
natural-language-inference-on-multinliT5-11B
Mismatched: 91.7
natural-language-inference-on-multinliT5-XXL 11B (fine-tuned)
Matched: 92.0
natural-language-inference-on-multinliT5-Large 770M
Mismatched: 89.6
natural-language-inference-on-multinliT5-Small
Matched: 82.4
Mismatched: 82.3
natural-language-inference-on-multinliT5-Large
Matched: 89.9
natural-language-inference-on-qnliT5-Small
Accuracy: 90.3%
natural-language-inference-on-qnliT5-Base
Accuracy: 93.7%
natural-language-inference-on-qnliT5-11B
Accuracy: 96.7%
natural-language-inference-on-qnliT5-Large 770M
Accuracy: 94.8%
natural-language-inference-on-qnliT5-3B
Accuracy: 96.3%
natural-language-inference-on-rteT5-Large 770M
Accuracy: 87.2%
natural-language-inference-on-rteT5-Base 220M
Accuracy: 80.1%
natural-language-inference-on-rteT5-XL 3B
Accuracy: 91.1%
natural-language-inference-on-rteT5-XXL 11B (fine-tuned)
Accuracy: 92.5%
natural-language-inference-on-rteT5-Small
Accuracy: 69.9%
natural-language-inference-on-wnliT5-Base 220M
Accuracy: 78.8
natural-language-inference-on-wnliT5-Large 770M
Accuracy: 85.6
natural-language-inference-on-wnliT5-XL 3B
Accuracy: 89.7
natural-language-inference-on-wnliT5-Small 60M
Accuracy: 69.2
natural-language-inference-on-wnliT5-XXL 11B
Accuracy: 93.2
poll-generation-on-weibopollsT5
BLEU-1: 37.34
BLEU-3: 21.06
ROUGE-1: 45.33
ROUGE-L: 42.69
question-answering-on-boolqT5-Small 60M (fine-tuned)
Accuracy: 76.4
question-answering-on-boolqT5-Base 220M (fine-tuned)
Accuracy: 81.4
question-answering-on-boolqT5-XXL 11B (fine-tuned)
Accuracy: 91.2
question-answering-on-boolqT5-Large 770M (fine-tuned)
Accuracy: 85.4
question-answering-on-copaT5-XL 3B (fine-tuned)
Accuracy: 92
question-answering-on-copaT5-XXL 11B (fine-tuned)
Accuracy: 94.8
question-answering-on-copaT5-Large 770M (fine-tuned)
Accuracy: 83.4
question-answering-on-copaT5-Base 220M (fine-tuned)
Accuracy: 71.2
question-answering-on-multircT5-XXL 11B (fine-tuned)
F1: 88.1
question-answering-on-multircT5-11B
EM: 63.3
question-answering-on-quora-question-pairsT5-11B
Accuracy: 90.4%
question-answering-on-quora-question-pairsT5-Small
Accuracy: 88.0%
question-answering-on-quora-question-pairsT5-Base
Accuracy: 89.4%
question-answering-on-quora-question-pairsT5-3B
Accuracy: 89.7%
question-answering-on-quora-question-pairsT5-Large 770M
Accuracy: 89.9%
question-answering-on-squad11-devT5-3B
EM: 88.53
F1: 94.95
question-answering-on-squad11-devT5-Small
EM: 79.1
F1: 87.24
question-answering-on-squad11-devT5-Base
EM: 85.44
F1: 92.08
question-answering-on-squad11-devT5-11B
EM: 90.06
F1: 95.64
question-answering-on-squad11-devT5-Large 770M
EM: 86.66
F1: 93.79
question-answering-on-webquestionsT5.1.1-XXL+SSM
EM: 42.8
question-generation-on-weibopollsT5
BLEU-1: 36.91
BLEU-3: 16.26
ROUGE-1: 44.46
ROUGE-L: 42.06
semantic-parsing-on-webquestionsspT5-11B (Raffel et al., 2020)
Accuracy: 56.5
semantic-textual-similarity-on-mrpcT5-3B
Accuracy: 89.2%
F1: 92.5
semantic-textual-similarity-on-mrpcT5-Large
Accuracy: 89.9%
F1: 92.4
semantic-textual-similarity-on-mrpcT5-Small
Accuracy: 86.6%
F1: 89.7
semantic-textual-similarity-on-mrpcT5-11B
Accuracy: 90.0%
F1: 91.9
semantic-textual-similarity-on-mrpcT5-Base
Accuracy: 87.5%
F1: 90.7
semantic-textual-similarity-on-sts-benchmarkT5-Large 770M
Spearman Correlation: 0.886
semantic-textual-similarity-on-sts-benchmarkT5-Small
Pearson Correlation: 0.856
Spearman Correlation: 0.85
semantic-textual-similarity-on-sts-benchmarkT5-11B
Pearson Correlation: 0.925
Spearman Correlation: 0.921
semantic-textual-similarity-on-sts-benchmarkT5-Base
Pearson Correlation: 0.894
semantic-textual-similarity-on-sts-benchmarkT5-Large
Pearson Correlation: 0.899
semantic-textual-similarity-on-sts-benchmarkT5-3B
Pearson Correlation: 0.906
Spearman Correlation: 0.898
sentiment-analysis-on-sst-2-binaryT5-11B
Accuracy: 97.5
sentiment-analysis-on-sst-2-binaryT5-3B
Accuracy: 97.4
sentiment-analysis-on-sst-2-binaryT5-Large 770M
Accuracy: 96.3
sentiment-analysis-on-sst-2-binaryT5-Small
Accuracy: 91.8
sentiment-analysis-on-sst-2-binaryT5-Base
Accuracy: 95.2
word-sense-disambiguation-on-words-in-contextT5-XXL 11B
Accuracy: 76.9

用 AI 构建 AI

从想法到上线——通过免费 AI 协同编程、开箱即用的环境和市场最优价格的 GPU 加速您的 AI 开发

AI 协同编程
即用型 GPU
最优价格
立即开始

Hyper Newsletters

订阅我们的最新资讯
我们会在北京时间 每周一的上午九点 向您的邮箱投递本周内的最新更新
邮件发送服务由 MailChimp 提供
探索统一文本到文本转换器的迁移学习极限 | 论文 | HyperAI超神经