HyperAIHyperAI

Command Palette

Search for a command to run...

5 months ago

PaLM: Scaling Language Modeling with Pathways

Aakanksha Chowdhery; Sharan Narang; Jacob Devlin; Maarten Bosma; Gaurav Mishra; Adam Roberts; Paul Barham; Hyung Won Chung; Charles Sutton; Sebastian Gehrmann; Parker Schuh; Kensen Shi; Sasha Tsvyashchenko; Joshua Maynez; Abhishek Rao; Parker Barnes; Yi Tay; Noam Shazeer; Vinodkumar Prabhakaran; Emily Reif; Nan Du; Ben Hutchinson; Reiner Pope; James Bradbury; Jacob Austin; Michael Isard; Guy Gur-Ari; Pengcheng Yin; Toju Duke; Anselm Levskaya; Sanjay Ghemawat; Sunipa Dev; Henryk Michalewski; Xavier Garcia; Vedant Misra; Kevin Robinson; Liam Fedus; Denny Zhou; Daphne Ippolito; David Luan; Hyeontaek Lim; Barret Zoph; Alexander Spiridonov; Ryan Sepassi; David Dohan; Shivani Agrawal; Mark Omernick; Andrew M. Dai; Thanumalayan Sankaranarayana Pillai; Marie Pellat; Aitor Lewkowycz; Erica Moreira; Rewon Child; Oleksandr Polozov; Katherine Lee; Zongwei Zhou; Xuezhi Wang; Brennan Saeta; Mark Diaz; Orhan Firat; Michele Catasta; Jason Wei; Kathy Meier-Hellstern; Douglas Eck; Jeff Dean; Slav Petrov; Noah Fiedel

PaLM: Scaling Language Modeling with Pathways

Abstract

Large language models have been shown to achieve remarkable performance across a variety of natural language tasks using few-shot learning, which drastically reduces the number of task-specific training examples needed to adapt the model to a particular application. To further our understanding of the impact of scale on few-shot learning, we trained a 540-billion parameter, densely activated, Transformer language model, which we call Pathways Language Model PaLM. We trained PaLM on 6144 TPU v4 chips using Pathways, a new ML system which enables highly efficient training across multiple TPU Pods. We demonstrate continued benefits of scaling by achieving state-of-the-art few-shot learning results on hundreds of language understanding and generation benchmarks. On a number of these tasks, PaLM 540B achieves breakthrough performance, outperforming the finetuned state-of-the-art on a suite of multi-step reasoning tasks, and outperforming average human performance on the recently released BIG-bench benchmark. A significant number of BIG-bench tasks showed discontinuous improvements from model scale, meaning that performance steeply increased as we scaled to our largest model. PaLM also has strong capabilities in multilingual tasks and source code generation, which we demonstrate on a wide array of benchmarks. We additionally provide a comprehensive analysis on bias and toxicity, and study the extent of training data memorization with respect to model scale. Finally, we discuss the ethical considerations related to large language models and discuss potential mitigation strategies.

Code Repositories

chrisociepa/allamo
pytorch
Mentioned in GitHub
foundation-model-stack/fms-fsdp
pytorch
Mentioned in GitHub
google/paxml
jax
Mentioned in GitHub
lucidrains/CoCa-pytorch
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
auto-debugging-on-big-bench-litePaLM 62B (few-shot, k=5)
Exact string match: 38.2
auto-debugging-on-big-bench-litePaLM 8B (few-shot, k=5)
Exact string match: 14.7
auto-debugging-on-big-bench-litePaLM 540B (few-shot, k=5)
Exact string match: 38.2
code-generation-on-mbppPaLM Coder 540B
Accuracy: 47
code-generation-on-mbppPaLM 540B
Accuracy: 36.8
common-sense-reasoning-on-big-bench-knownPaLM-540B (few-shot, k=5)
Accuracy: 73.9
common-sense-reasoning-on-big-bench-winowhyPaLM-62B (few-shot, k=5)
Accuracy: 61.0
common-sense-reasoning-on-big-bench-winowhyPaLM-540B (few-shot, k=5)
Accuracy: 65.9
common-sense-reasoning-on-recordPaLM 540B (finetuned)
EM: 94.0
F1: 94.6
common-sense-reasoning-on-winograndePaLM 62B (0-shot)
Accuracy: 77.0
common-sense-reasoning-on-winograndePaLM 540B (0-shot)
Accuracy: 81.1
common-sense-reasoning-on-winograndePaLM-cont 62B (0-shot)
Accuracy: 77.0
coreference-resolution-on-winograd-schemaPaLM 540B (1-shot)
Accuracy: 86.3
coreference-resolution-on-winograd-schemaPaLM 540B (0-shot)
Accuracy: 89.1
coreference-resolution-on-winograd-schemaPaLM 540B (fine-tuned)
Accuracy: 100
coreference-resolution-on-winograd-schemaPaLM 540B (5-shot)
Accuracy: 89.5
cross-lingual-question-answering-on-tydiqaPaLM-540B (CoT)
EM: 52.9
extreme-summarization-on-gem-xsumPaLM (finetuning)-540B
Parameters: 540 B
ROUGE-2: 21.2
extreme-summarization-on-gem-xsumT5-XXL
ROUGE-2: 21.0
extreme-summarization-on-gem-xsumPaLM (finetuning)-62B
Parameters: 62 B
ROUGE-2: 18.5
language-modelling-on-lambadaPaLM-540B (Zero-Shot)
Accuracy: 77.9
language-modelling-on-lambadaPaLM-540B (Few-Shot)
Accuracy: 89.7
language-modelling-on-lambadaPaLM-540B (One-Shot)
Accuracy: 81.8
logical-reasoning-on-big-bench-strategyqaPaLM-62B (few-shot, k=5)
Accuracy: 65.4
logical-reasoning-on-big-bench-strategyqaPaLM-540B (few-shot, k=5)
Accuracy: 73.9
memorization-on-big-bench-hindu-knowledgePaLM-540B (few-shot, k=5)
Accuracy: 95.4
memorization-on-big-bench-hindu-knowledgePaLM-62B (few-shot, k=5)
Accuracy: 77.7
multi-task-language-understanding-on-mgsmPaLM 540B
Average (%): 55.0
multiple-choice-question-answering-mcqa-on-31PaLM-62B (few-shot, k=5)
Accuracy: 59.4
multiple-choice-question-answering-mcqa-on-31PaLM-540B (few-shot, k=5)
Accuracy: 71.9
natural-language-inference-on-commitmentbankPaLM 540B (finetuned)
Accuracy: 100
F1: 100
natural-language-inference-on-rtePaLM 540B (1-shot)
Accuracy: 78.7%
natural-language-inference-on-rtePaLM 540B (0-shot)
Accuracy: 72.9%
natural-language-inference-on-rtePaLM 540B (5-shot)
Accuracy: 79.6%
natural-language-inference-on-rtePaLM 540B (fine-tuned)
Accuracy: 95.7%
question-answering-on-boolqPaLM 540B (fine-tuned)
Accuracy: 92.2
question-answering-on-copaPaLM 540B (finetuned)
Accuracy: 100
question-answering-on-multircPaLM 540B (finetuned)
EM: 69.2
F1: 90.1
question-answering-on-natural-questionsPaLM-540B (Zero-Shot)
EM: 21.2
question-answering-on-natural-questionsPaLM-540B (One-Shot)
EM: 29.3
question-answering-on-natural-questionsPaLM-540B (Few-Shot, k=64)
EM: 39.6
question-answering-on-obqaPaLM 540B (zero-shot)
Accuracy: 53.4
question-answering-on-obqaPaLM 62B (zero-shot)
Accuracy: 50.4
question-answering-on-triviaqaPaLM-540B (Zero-Shot)
EM: 76.9
question-answering-on-triviaqaPaLM-540B (One-Shot)
EM: 81.4
question-answering-on-triviaqaPaLM-540B (Few-Shot)
EM: 81.4
question-answering-on-webquestionsPaLM-540B (Zero-Shot)
EM: 10.6
question-answering-on-webquestionsPaLM-540B (One-Shot)
EM: 22.6
question-answering-on-webquestionsPaLM-540B (Few-Shot)
EM: 43.5
reading-comprehension-on-racePaLM 8B (zero-shot)
Accuracy (High): 42.3
Accuracy (Middle): 57.9
reading-comprehension-on-racePaLM 540B (zero-shot)
Accuracy (High): 49.1
Accuracy (Middle): 68.1
reading-comprehension-on-racePaLM 62B (zero-shot)
Accuracy (High): 47.5
Accuracy (Middle): 64.3
word-sense-disambiguation-on-words-in-contextPaLM 540B (finetuned)
Accuracy: 78.8

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
PaLM: Scaling Language Modeling with Pathways | Papers | HyperAI