4 个月前

PaLM:通过路径扩展语言模型

PaLM:通过路径扩展语言模型

摘要

大型语言模型在少样本学习中已展现出卓越的性能,能够显著减少适应特定应用所需的任务特定训练样本数量。为了进一步研究规模对少样本学习的影响,我们训练了一个拥有5400亿参数、密集激活的Transformer语言模型,命名为Pathways语言模型(PaLM)。我们利用Pathways这一新的机器学习系统,在6144个TPU v4芯片上对PaLM进行了训练,该系统能够在多个TPU Pod之间实现高效的训练。通过在数百个语言理解和生成基准测试中取得最先进的少样本学习结果,我们展示了继续扩展模型规模所带来的好处。在这些任务中的许多任务上,PaLM 540B实现了突破性的性能,超越了经过微调的最先进模型在一系列多步推理任务上的表现,并且在最近发布的BIG-bench基准测试中超过了普通人类的表现。大量的BIG-bench任务显示了随着模型规模扩大而带来的非连续性改进,这意味着性能在我们扩展到最大模型时急剧提升。此外,PaLM在多语言任务和源代码生成方面也表现出强大的能力,这一点我们在多种基准测试中得到了验证。我们还提供了关于偏见和毒性的全面分析,并研究了不同模型规模下的训练数据记忆程度。最后,我们讨论了与大型语言模型相关的伦理问题,并探讨了潜在的缓解策略。

代码仓库

chrisociepa/allamo
pytorch
GitHub 中提及
foundation-model-stack/fms-fsdp
pytorch
GitHub 中提及
google/paxml
jax
GitHub 中提及
lucidrains/CoCa-pytorch
pytorch
GitHub 中提及

基准测试

基准方法指标
auto-debugging-on-big-bench-litePaLM 62B (few-shot, k=5)
Exact string match: 38.2
auto-debugging-on-big-bench-litePaLM 8B (few-shot, k=5)
Exact string match: 14.7
auto-debugging-on-big-bench-litePaLM 540B (few-shot, k=5)
Exact string match: 38.2
code-generation-on-mbppPaLM Coder 540B
Accuracy: 47
code-generation-on-mbppPaLM 540B
Accuracy: 36.8
common-sense-reasoning-on-big-bench-knownPaLM-540B (few-shot, k=5)
Accuracy: 73.9
common-sense-reasoning-on-big-bench-winowhyPaLM-62B (few-shot, k=5)
Accuracy: 61.0
common-sense-reasoning-on-big-bench-winowhyPaLM-540B (few-shot, k=5)
Accuracy: 65.9
common-sense-reasoning-on-recordPaLM 540B (finetuned)
EM: 94.0
F1: 94.6
common-sense-reasoning-on-winograndePaLM 62B (0-shot)
Accuracy: 77.0
common-sense-reasoning-on-winograndePaLM 540B (0-shot)
Accuracy: 81.1
common-sense-reasoning-on-winograndePaLM-cont 62B (0-shot)
Accuracy: 77.0
coreference-resolution-on-winograd-schemaPaLM 540B (1-shot)
Accuracy: 86.3
coreference-resolution-on-winograd-schemaPaLM 540B (0-shot)
Accuracy: 89.1
coreference-resolution-on-winograd-schemaPaLM 540B (fine-tuned)
Accuracy: 100
coreference-resolution-on-winograd-schemaPaLM 540B (5-shot)
Accuracy: 89.5
cross-lingual-question-answering-on-tydiqaPaLM-540B (CoT)
EM: 52.9
extreme-summarization-on-gem-xsumPaLM (finetuning)-540B
Parameters: 540 B
ROUGE-2: 21.2
extreme-summarization-on-gem-xsumT5-XXL
ROUGE-2: 21.0
extreme-summarization-on-gem-xsumPaLM (finetuning)-62B
Parameters: 62 B
ROUGE-2: 18.5
language-modelling-on-lambadaPaLM-540B (Zero-Shot)
Accuracy: 77.9
language-modelling-on-lambadaPaLM-540B (Few-Shot)
Accuracy: 89.7
language-modelling-on-lambadaPaLM-540B (One-Shot)
Accuracy: 81.8
logical-reasoning-on-big-bench-strategyqaPaLM-62B (few-shot, k=5)
Accuracy: 65.4
logical-reasoning-on-big-bench-strategyqaPaLM-540B (few-shot, k=5)
Accuracy: 73.9
memorization-on-big-bench-hindu-knowledgePaLM-540B (few-shot, k=5)
Accuracy: 95.4
memorization-on-big-bench-hindu-knowledgePaLM-62B (few-shot, k=5)
Accuracy: 77.7
multi-task-language-understanding-on-mgsmPaLM 540B
Average (%): 55.0
multiple-choice-question-answering-mcqa-on-31PaLM-62B (few-shot, k=5)
Accuracy: 59.4
multiple-choice-question-answering-mcqa-on-31PaLM-540B (few-shot, k=5)
Accuracy: 71.9
natural-language-inference-on-commitmentbankPaLM 540B (finetuned)
Accuracy: 100
F1: 100
natural-language-inference-on-rtePaLM 540B (1-shot)
Accuracy: 78.7%
natural-language-inference-on-rtePaLM 540B (0-shot)
Accuracy: 72.9%
natural-language-inference-on-rtePaLM 540B (5-shot)
Accuracy: 79.6%
natural-language-inference-on-rtePaLM 540B (fine-tuned)
Accuracy: 95.7%
question-answering-on-boolqPaLM 540B (fine-tuned)
Accuracy: 92.2
question-answering-on-copaPaLM 540B (finetuned)
Accuracy: 100
question-answering-on-multircPaLM 540B (finetuned)
EM: 69.2
F1: 90.1
question-answering-on-natural-questionsPaLM-540B (Zero-Shot)
EM: 21.2
question-answering-on-natural-questionsPaLM-540B (One-Shot)
EM: 29.3
question-answering-on-natural-questionsPaLM-540B (Few-Shot, k=64)
EM: 39.6
question-answering-on-obqaPaLM 540B (zero-shot)
Accuracy: 53.4
question-answering-on-obqaPaLM 62B (zero-shot)
Accuracy: 50.4
question-answering-on-triviaqaPaLM-540B (Zero-Shot)
EM: 76.9
question-answering-on-triviaqaPaLM-540B (One-Shot)
EM: 81.4
question-answering-on-triviaqaPaLM-540B (Few-Shot)
EM: 81.4
question-answering-on-webquestionsPaLM-540B (Zero-Shot)
EM: 10.6
question-answering-on-webquestionsPaLM-540B (One-Shot)
EM: 22.6
question-answering-on-webquestionsPaLM-540B (Few-Shot)
EM: 43.5
reading-comprehension-on-racePaLM 8B (zero-shot)
Accuracy (High): 42.3
Accuracy (Middle): 57.9
reading-comprehension-on-racePaLM 540B (zero-shot)
Accuracy (High): 49.1
Accuracy (Middle): 68.1
reading-comprehension-on-racePaLM 62B (zero-shot)
Accuracy (High): 47.5
Accuracy (Middle): 64.3
word-sense-disambiguation-on-words-in-contextPaLM 540B (finetuned)
Accuracy: 78.8

用 AI 构建 AI

从想法到上线——通过免费 AI 协同编程、开箱即用的环境和市场最优价格的 GPU 加速您的 AI 开发

AI 协同编程
即用型 GPU
最优价格
立即开始

Hyper Newsletters

订阅我们的最新资讯
我们会在北京时间 每周一的上午九点 向您的邮箱投递本周内的最新更新
邮件发送服务由 MailChimp 提供
PaLM:通过路径扩展语言模型 | 论文 | HyperAI超神经