Command Palette
Search for a command to run...
AlexaTM 20B: Few-Shot Learning Using a Large-Scale Multilingual Seq2Seq Model

Abstract
In this work, we demonstrate that multilingual large-scale sequence-to-sequence (seq2seq) models, pre-trained on a mixture of denoising and Causal Language Modeling (CLM) tasks, are more efficient few-shot learners than decoder-only models on various tasks. In particular, we train a 20 billion parameter multilingual seq2seq model called Alexa Teacher Model (AlexaTM 20B) and show that it achieves state-of-the-art (SOTA) performance on 1-shot summarization tasks, outperforming a much larger 540B PaLM decoder model. AlexaTM 20B also achieves SOTA in 1-shot machine translation, especially for low-resource languages, across almost all language pairs supported by the model (Arabic, English, French, German, Hindi, Italian, Japanese, Marathi, Portuguese, Spanish, Tamil, and Telugu) on Flores-101 dataset. We also show in zero-shot setting, AlexaTM 20B outperforms GPT3 (175B) on SuperGLUE and SQuADv2 datasets and provides SOTA performance on multilingual tasks such as XNLI, XCOPA, Paws-X, and XWinograd. Overall, our results present a compelling case for seq2seq models as a powerful alternative to decoder-only models for Large-scale Language Model (LLM) training.
Code Repositories
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| common-sense-reasoning-on-record | AlexaTM 20B | F1: 88.4 |
| coreference-resolution-on-winograd-schema | AlexaTM 20B | Accuracy: 68.3 |
| natural-language-inference-on-commitmentbank | AlexaTM 20B | Accuracy: 67.9 |
| natural-language-inference-on-rte | AlexaTM 20B | Accuracy: 68.6% |
| question-answering-on-boolq | AlexaTM 20B | Accuracy: 69.4 |
| question-answering-on-copa | AlexaTM 20B | Accuracy: 78.0 |
| question-answering-on-multirc | AlexaTM 20B | F1: 59.6 |
| word-sense-disambiguation-on-words-in-context | AlexaTM 20B | Accuracy: 53.3 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.