Command Palette
Search for a command to run...
Linting Xue Noah Constant Adam Roberts Mihir Kale Rami Al-Rfou Aditya Siddhant Aditya Barua Colin Raffel

Abstract
The recent "Text-to-Text Transfer Transformer" (T5) leveraged a unified text-to-text format and scale to attain state-of-the-art results on a wide variety of English-language NLP tasks. In this paper, we introduce mT5, a multilingual variant of T5 that was pre-trained on a new Common Crawl-based dataset covering 101 languages. We detail the design and modified training of mT5 and demonstrate its state-of-the-art performance on many multilingual benchmarks. We also describe a simple technique to prevent "accidental translation" in the zero-shot setting, where a generative model chooses to (partially) translate its prediction into the wrong language. All of the code and model checkpoints used in this work are publicly available.
Code Repositories
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| common-sense-reasoning-on-parus | MT5 Large | Accuracy: 0.504 |
| common-sense-reasoning-on-rucos | MT5 Large | Average F1: 0.57 EM : 0.562 |
| common-sense-reasoning-on-rwsd | MT5 Large | Accuracy: 0.669 |
| natural-language-inference-on-lidirus | MT5 Large | MCC: 0.061 |
| natural-language-inference-on-rcb | MT5 Large | Accuracy: 0.454 Average F1: 0.366 |
| natural-language-inference-on-terra | MT5 Large | Accuracy: 0.561 |
| question-answering-on-danetqa | MT5 Large | Accuracy: 0.657 |
| reading-comprehension-on-muserc | MT5 Large | Average F1: 0.844 EM : 0.543 |
| zero-shot-cross-lingual-transfer-on-xtreme | mT5 | Avg: 40.9 Question Answering: 73.6 Sentence Retrieval: NA Sentence-pair Classification: 89.8 Structured Prediction: NA |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.