HyperAIHyperAI

Command Palette

Search for a command to run...

5 months ago

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Colin Raffel; Noam Shazeer; Adam Roberts; Katherine Lee; Sharan Narang; Michael Matena; Yanqi Zhou; Wei Li; Peter J. Liu

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Abstract

Transfer learning, where a model is first pre-trained on a data-rich task before being fine-tuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP). The effectiveness of transfer learning has given rise to a diversity of approaches, methodology, and practice. In this paper, we explore the landscape of transfer learning techniques for NLP by introducing a unified framework that converts all text-based language problems into a text-to-text format. Our systematic study compares pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks. By combining the insights from our exploration with scale and our new ``Colossal Clean Crawled Corpus'', we achieve state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more. To facilitate future work on transfer learning for NLP, we release our data set, pre-trained models, and code.

Code Repositories

google-research/t5x_retrieval
jax
Mentioned in GitHub
um-arm-lab/efficient-eng-2-ltl
pytorch
Mentioned in GitHub
s-nlp/russe_detox_2022
pytorch
Mentioned in GitHub
conceptofmind/lamda-rlhf-pytorch
pytorch
Mentioned in GitHub
amazon-science/chronos-forecasting
pytorch
Mentioned in GitHub
conceptofmind/LaMDA-pytorch
pytorch
Mentioned in GitHub
allenai/dolma
Mentioned in GitHub
jongwooko/nash-pruning-official
pytorch
Mentioned in GitHub
gulucaptain/dynamictrl
pytorch
Mentioned in GitHub
shivamraval98/multitask-t5_ae
pytorch
Mentioned in GitHub
JunnYu/x-transformers-paddle
jax
Mentioned in GitHub
vgaraujov/seq2seq-spanish-plms
pytorch
Mentioned in GitHub
qipengguo/p2_webnlg2020
pytorch
Mentioned in GitHub
volcengine/vegiantmodel
pytorch
Mentioned in GitHub
cccntu/ft5-demo-space
pytorch
Mentioned in GitHub
airc-keti/ke-t5
tf
Mentioned in GitHub
allenai/c4-documentation
Mentioned in GitHub
google-research/t5x
jax
Mentioned in GitHub
yizhongw/tk-instruct
pytorch
Mentioned in GitHub
Ki6an/fastT5
pytorch
Mentioned in GitHub
skoltech-nlp/russe_detox_2022
pytorch
Mentioned in GitHub
KAGUYAHONGLAI/SRC
tf
Mentioned in GitHub
facebookresearch/atlas
pytorch
Mentioned in GitHub
huggingface/transformers
pytorch
Mentioned in GitHub
zhiqic/chartreader
pytorch
Mentioned in GitHub
dawn0815/UniSA
pytorch
Mentioned in GitHub
xuetianci/pacit
pytorch
Mentioned in GitHub
Sharif-SLPL/t5-fa
jax
Mentioned in GitHub
thu-keg/omnievent
pytorch
Mentioned in GitHub
lesterpjy/numeric-t5
Mentioned in GitHub
abelriboulot/onnxt5
pytorch
Mentioned in GitHub
ArvinZhuang/BiTAG
pytorch
Mentioned in GitHub
thudm/swissarmytransformer
pytorch
Mentioned in GitHub
itzprashu1/prashant
tf
Mentioned in GitHub
junnyu/paddle_t5
paddle
Mentioned in GitHub
luomancs/retriever_reader_for_okvqa
pytorch
Mentioned in GitHub
wangcongcong123/ttt
pytorch
Mentioned in GitHub
cccntu/ft5-demo
pytorch
Mentioned in GitHub
ibm/graph_ensemble_learning
pytorch
Mentioned in GitHub
google/seqio
tf
Mentioned in GitHub
asahi417/lmppl
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
abstractive-text-summarization-on-cnn-dailyT5
ROUGE-1: 43.52
ROUGE-2: 21.55
ROUGE-L: 40.69
answer-generation-on-weibopollsT5
BLEU-1: 37.77
BLEU-3: 25.86
ROUGE-1: 46.20
ROUGE-L: 43.32
common-sense-reasoning-on-recordT5-XXL 11B (fine-tuned)
EM: 93.4
common-sense-reasoning-on-recordT5-11B
F1: 94.1
coreference-resolution-on-winograd-schemaT5-XXL 11B (fine-tuned)
Accuracy: 93.8
document-summarization-on-cnn-daily-mailT5-11B
ROUGE-1: 43.52
ROUGE-2: 21.55
ROUGE-L: 40.69
linguistic-acceptability-on-colaT5-Base
Accuracy: 51.1%
linguistic-acceptability-on-colaT5-XL 3B
Accuracy: 67.1%
linguistic-acceptability-on-colaT5-Small
Accuracy: 41.0%
linguistic-acceptability-on-colaT5-Large 770M
Accuracy: 61.2%
linguistic-acceptability-on-colaT5-11B
Accuracy: 70.8%
machine-translation-on-wmt2014-english-frenchT5
BLEU score: 43.4
machine-translation-on-wmt2014-english-germanT5-11B
BLEU score: 32.1
Number of Params: 11110M
multimodal-intent-recognition-on-photochatT5-3B
F1: 58.9
Precision: 54.1
Recall: 64.6
multimodal-intent-recognition-on-photochatT5-base
F1: 58.1
Precision: 58.2
Recall: 57.9
natural-language-inference-on-commitmentbankT5-XXL 11B (fine-tuned)
Accuracy: 96.8
F1: 93.9
natural-language-inference-on-commitmentbankT5-Large 770M (fine-tuned)
Accuracy: 94.4
F1: 90.3
natural-language-inference-on-commitmentbankT5-Base 220M (fine-tuned)
Accuracy: 94
F1: 86.2
natural-language-inference-on-multinliT5-Base
Matched: 87.1
Mismatched: 86.2
natural-language-inference-on-multinliT5-3B
Matched: 91.4
Mismatched: 91.2
natural-language-inference-on-multinliT5-11B
Mismatched: 91.7
natural-language-inference-on-multinliT5-XXL 11B (fine-tuned)
Matched: 92.0
natural-language-inference-on-multinliT5-Large 770M
Mismatched: 89.6
natural-language-inference-on-multinliT5-Small
Matched: 82.4
Mismatched: 82.3
natural-language-inference-on-multinliT5-Large
Matched: 89.9
natural-language-inference-on-qnliT5-Small
Accuracy: 90.3%
natural-language-inference-on-qnliT5-Base
Accuracy: 93.7%
natural-language-inference-on-qnliT5-11B
Accuracy: 96.7%
natural-language-inference-on-qnliT5-Large 770M
Accuracy: 94.8%
natural-language-inference-on-qnliT5-3B
Accuracy: 96.3%
natural-language-inference-on-rteT5-Large 770M
Accuracy: 87.2%
natural-language-inference-on-rteT5-Base 220M
Accuracy: 80.1%
natural-language-inference-on-rteT5-XL 3B
Accuracy: 91.1%
natural-language-inference-on-rteT5-XXL 11B (fine-tuned)
Accuracy: 92.5%
natural-language-inference-on-rteT5-Small
Accuracy: 69.9%
natural-language-inference-on-wnliT5-Base 220M
Accuracy: 78.8
natural-language-inference-on-wnliT5-Large 770M
Accuracy: 85.6
natural-language-inference-on-wnliT5-XL 3B
Accuracy: 89.7
natural-language-inference-on-wnliT5-Small 60M
Accuracy: 69.2
natural-language-inference-on-wnliT5-XXL 11B
Accuracy: 93.2
poll-generation-on-weibopollsT5
BLEU-1: 37.34
BLEU-3: 21.06
ROUGE-1: 45.33
ROUGE-L: 42.69
question-answering-on-boolqT5-Small 60M (fine-tuned)
Accuracy: 76.4
question-answering-on-boolqT5-Base 220M (fine-tuned)
Accuracy: 81.4
question-answering-on-boolqT5-XXL 11B (fine-tuned)
Accuracy: 91.2
question-answering-on-boolqT5-Large 770M (fine-tuned)
Accuracy: 85.4
question-answering-on-copaT5-XL 3B (fine-tuned)
Accuracy: 92
question-answering-on-copaT5-XXL 11B (fine-tuned)
Accuracy: 94.8
question-answering-on-copaT5-Large 770M (fine-tuned)
Accuracy: 83.4
question-answering-on-copaT5-Base 220M (fine-tuned)
Accuracy: 71.2
question-answering-on-multircT5-XXL 11B (fine-tuned)
F1: 88.1
question-answering-on-multircT5-11B
EM: 63.3
question-answering-on-quora-question-pairsT5-11B
Accuracy: 90.4%
question-answering-on-quora-question-pairsT5-Small
Accuracy: 88.0%
question-answering-on-quora-question-pairsT5-Base
Accuracy: 89.4%
question-answering-on-quora-question-pairsT5-3B
Accuracy: 89.7%
question-answering-on-quora-question-pairsT5-Large 770M
Accuracy: 89.9%
question-answering-on-squad11-devT5-3B
EM: 88.53
F1: 94.95
question-answering-on-squad11-devT5-Small
EM: 79.1
F1: 87.24
question-answering-on-squad11-devT5-Base
EM: 85.44
F1: 92.08
question-answering-on-squad11-devT5-11B
EM: 90.06
F1: 95.64
question-answering-on-squad11-devT5-Large 770M
EM: 86.66
F1: 93.79
question-answering-on-webquestionsT5.1.1-XXL+SSM
EM: 42.8
question-generation-on-weibopollsT5
BLEU-1: 36.91
BLEU-3: 16.26
ROUGE-1: 44.46
ROUGE-L: 42.06
semantic-parsing-on-webquestionsspT5-11B (Raffel et al., 2020)
Accuracy: 56.5
semantic-textual-similarity-on-mrpcT5-3B
Accuracy: 89.2%
F1: 92.5
semantic-textual-similarity-on-mrpcT5-Large
Accuracy: 89.9%
F1: 92.4
semantic-textual-similarity-on-mrpcT5-Small
Accuracy: 86.6%
F1: 89.7
semantic-textual-similarity-on-mrpcT5-11B
Accuracy: 90.0%
F1: 91.9
semantic-textual-similarity-on-mrpcT5-Base
Accuracy: 87.5%
F1: 90.7
semantic-textual-similarity-on-sts-benchmarkT5-Large 770M
Spearman Correlation: 0.886
semantic-textual-similarity-on-sts-benchmarkT5-Small
Pearson Correlation: 0.856
Spearman Correlation: 0.85
semantic-textual-similarity-on-sts-benchmarkT5-11B
Pearson Correlation: 0.925
Spearman Correlation: 0.921
semantic-textual-similarity-on-sts-benchmarkT5-Base
Pearson Correlation: 0.894
semantic-textual-similarity-on-sts-benchmarkT5-Large
Pearson Correlation: 0.899
semantic-textual-similarity-on-sts-benchmarkT5-3B
Pearson Correlation: 0.906
Spearman Correlation: 0.898
sentiment-analysis-on-sst-2-binaryT5-11B
Accuracy: 97.5
sentiment-analysis-on-sst-2-binaryT5-3B
Accuracy: 97.4
sentiment-analysis-on-sst-2-binaryT5-Large 770M
Accuracy: 96.3
sentiment-analysis-on-sst-2-binaryT5-Small
Accuracy: 91.8
sentiment-analysis-on-sst-2-binaryT5-Base
Accuracy: 95.2
word-sense-disambiguation-on-words-in-contextT5-XXL 11B
Accuracy: 76.9

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer | Papers | HyperAI