4 个月前

扩展语言模型:训练Gopher的方法、分析与洞见

扩展语言模型:训练Gopher的方法、分析与洞见

摘要

语言模型通过利用大量的人类书面知识库,为实现智能通信系统迈出了重要一步,能够更好地预测和理解世界。在本文中,我们分析了基于Transformer架构的语言模型在不同规模下的性能表现——从参数量为数千万的模型到参数量达到2800亿的模型Gopher。这些模型在152个多样化的任务上进行了评估,大多数任务上均达到了当前最佳性能。规模带来的收益在诸如阅读理解、事实核查和有害语言识别等领域最为显著,但在逻辑推理和数学推理方面则相对较小。我们对训练数据集和模型的行为进行了全面分析,探讨了模型规模与偏见及有害内容之间的关系。最后,我们讨论了语言模型在人工智能安全领域的应用以及如何减轻下游风险。

代码仓库

allenai/dolma
GitHub 中提及
bramiozo/PubScience
GitHub 中提及
rvlopes/gloria
pytorch
GitHub 中提及

基准测试

基准方法指标
abstract-algebra-on-big-benchGopher-280B (few-shot, k=5)
Accuracy: 25.0
analogical-similarity-on-big-benchGopher-280B (few-shot, k=5)
Accuracy: 17.2
analytic-entailment-on-big-benchGopher-280B (few-shot, k=5)
Accuracy: 53.0
anatomy-on-big-benchGopher-280B (few-shot, k=5)
Accuracy : 56.3
astronomy-on-big-benchGopher-280B (few-shot, k=5)
Accuracy: 65.8
business-ethics-on-big-benchGopher-280B (few-shot, k=5)
Accuracy: 70.0
clinical-knowledge-on-big-benchGopher-280B (few-shot, k=5)
Accuracy : 67.2
college-biology-on-big-benchGopher-280B (few-shot, k=5)
Accuracy: 70.8
college-chemistry-on-big-benchGopher-280B (few-shot, k=5)
Accuracy: 45.0
college-computer-science-on-big-benchGopher-280B (few-shot, k=5)
Accuracy: 49
college-mathematics-on-big-benchGopher-280B (few-shot, k=5)
Accuracy: 37.0
college-medicine-on-big-benchGopher-280B (few-shot, k=5)
Accuracy : 60.1
college-physics-on-big-benchGopher-280B (few-shot, k=5)
Accuracy: 34.3
common-sense-reasoning-on-big-benchGopher-280B (few-shot, k=5)
Accuracy: 45.5
common-sense-reasoning-on-big-bench-causalGopher-280B (few-shot, k=5)
Accuracy: 50.8
common-sense-reasoning-on-big-bench-dateGopher-280B (few-shot, k=5)
Accuracy: 44.1
common-sense-reasoning-on-big-bench-knownGopher-280B (few-shot, k=5)
Accuracy: 63.6
common-sense-reasoning-on-big-bench-logicalGopher-280B (few-shot, k=5)
Accuracy: 36.4
common-sense-reasoning-on-big-bench-sportsGopher-280B (few-shot, k=5)
Accuracy: 54.9
common-sense-reasoning-on-big-bench-winowhyGopher-280B (few-shot, k=5)
Accuracy: 56.7
common-sense-reasoning-on-winograndeGopher 280B (0-shot)
Accuracy: 70.1
computer-security-on-big-benchGopher-280B (few-shot, k=5)
Accuracy : 65.0
conceptual-physics-on-big-benchGopher-280B (few-shot, k=5)
Accuracy: 49.4
crash-blossom-on-big-benchGopher-280B (few-shot, k=5)
Accuracy : 63.6
crass-ai-on-big-benchGopher-280B (few-shot, k=5)
Accuracy: 56.8
dark-humor-detection-on-big-benchGopher-280B (few-shot, k=5)
Accuracy: 83.1
discourse-marker-prediction-on-big-benchGopher-280B (few-shot, k=5)
Accuracy: 11.7
econometrics-on-big-benchGopher-280B (few-shot, k=5)
Accuracy: 43
electrical-engineering-on-big-benchGopher-280B (few-shot, k=5)
Accuracy: 60
elementary-mathematics-on-big-benchGopher-280B (few-shot, k=5)
Accuracy: 33.6
empirical-judgments-on-big-benchGopher-280B (few-shot, k=5)
Accuracy: 52.5
english-proverbs-on-big-benchGopher-280B (few-shot, k=5)
Accuracy : 57.6
entailed-polarity-on-big-benchGopher-280B (few-shot, k=5)
Accuracy: 89.5
epistemic-reasoning-on-big-benchGopher-280B (few-shot, k=5)
Accuracy: 56.4
evaluating-information-essentiality-on-bigGopher-280B (few-shot, k=5)
Accuracy: 16.7
fantasy-reasoning-on-big-benchGopher-280B (few-shot, k=5)
Accuracy : 64.1
fever-2-way-on-big-benchGopher-280B (few-shot, k=10)
Accuracy: 77.5
fever-3-way-on-big-benchGopher-280B (few-shot, k=15)
Accuracy: 77.5
figure-of-speech-detection-on-big-benchGopher-280B (few-shot, k=5)
Accuracy : 52.7
formal-logic-on-big-benchGopher-280B (few-shot, k=5)
Accuracy: 35.7
general-knowledge-on-big-benchGopher-280B (few-shot, k=5)
Accuracy: 93.9
global-facts-on-big-benchGopher-280B (few-shot, k=5)
Accuracy: 38.0
gre-reading-comprehension-on-big-benchGopher-280B (few-shot, k=5)
Accuracy : 27.3
high-school-biology-on-big-benchGopher-280B (few-shot, k=5)
Accuracy: 71.3
high-school-chemistry-on-big-benchGopher-280B (few-shot, k=5)
Accuracy: 47.8
high-school-computer-science-on-big-benchGopher-280B (few-shot, k=5)
Accuracy: 54.0
high-school-european-history-on-big-benchGopher-280B (few-shot, k=5)
Accuracy: 72.1
high-school-geography-on-big-benchGopher-280B (few-shot, k=5)
Accuracy : 76.8
high-school-government-and-politics-on-bigGopher-280B (few-shot, k=5)
Accuracy : 83.9
high-school-macroeconomics-on-big-benchGopher-280B (few-shot, k=5)
Accuracy : 65.1
high-school-mathematics-on-big-benchGopher-280B (few-shot, k=5)
Accuracy: 23.7
high-school-microeconomics-on-big-benchGopher-280B (few-shot, k=5)
Accuracy: 66.4
high-school-physics-on-big-benchGopher-280B (few-shot, k=5)
Accuracy: 33.8
high-school-psychology-on-big-benchGopher-280B (few-shot, k=5)
Accuracy : 81.8
high-school-statistics-on-big-benchGopher-280B (few-shot, k=5)
Accuracy: 50
high-school-us-history-on-big-benchGopher-280B (few-shot, k=5)
Accuracy: 78.9
high-school-world-history-on-big-benchGopher-280B (few-shot, k=5)
Accuracy: 75.1
human-aging-on-big-benchGopher-280B (few-shot, k=5)
Accuracy : 66.4
human-organs-senses-multiple-choice-on-bigGopher-280B (few-shot, k=5)
Accuracy : 84.8
human-sexuality-on-big-benchGopher-280B (few-shot, k=5)
Accuracy: 67.2
identify-odd-metapor-on-big-benchGopher-280B (few-shot, k=5)
Accuracy: 38.6
implicatures-on-big-benchGopher-280B (few-shot, k=5)
Accuracy: 62.0
implicit-relations-on-big-benchGopher-280B (few-shot, k=5)
Accuracy : 36.4
intent-recognition-on-big-benchGopher-280B (few-shot, k=5)
Accuracy : 88.7
international-law-on-big-benchGopher-280B (few-shot, k=5)
Accuracy: 77.7
irony-identification-on-big-benchGopher-280B (few-shot, k=5)
Accuracy: 69.7
jurisprudence-on-big-benchGopher-280B (few-shot, k=5)
Accuracy : 71.3
lambada-on-big-benchGopher-280B (zero-shot)
Accuracy : 74.5
language-modelling-on-arxivGopher
BPB: 0.662
language-modelling-on-bookcorpus2Gopher
BPB: 0.741
language-modelling-on-books3Gopher
BPB: 0.712
language-modelling-on-curation-corpusGopher
BPB: 0.475
language-modelling-on-dm-mathematicsGopher
BPB: 1.14
language-modelling-on-freelawGopher
BPB: 0.513
language-modelling-on-githubGopher
BPB: 0.377
language-modelling-on-gutenberg-pg-19Gopher
BPB: 0.656
language-modelling-on-hackernewsGopher
BPB: 0.890
language-modelling-on-nih-exporterGopher
BPB: 0.590
language-modelling-on-opensubtitlesGopher
BPB: 0.899
language-modelling-on-openwebtext2Gopher
BPB: 0.677
language-modelling-on-philpapersGopher
BPB: 0.695
language-modelling-on-pile-ccGopher
BPB: 0.691
language-modelling-on-pubmed-abstractsGopher
BPB: 0.577
language-modelling-on-pubmed-centralGopher
BPB: 0.525
language-modelling-on-stackexchangeGopher
BPB: 0.641
language-modelling-on-ubuntu-ircGopher
BPB: 1.09
language-modelling-on-uspto-backgroundsGopher
BPB: 0.546
logical-args-on-big-benchGopher-280B (few-shot, k=5)
Accuracy : 59.1
logical-fallacies-on-big-benchGopher-280B (few-shot, k=5)
Accuracy : 72.4
logical-reasoning-on-big-bench-formalGopher-280B (few-shot, k=5)
Accuracy: 50.7
logical-reasoning-on-big-bench-logic-gridGopher-280B (few-shot, k=5)
Accuracy: 35.1
logical-reasoning-on-big-bench-logicalGopher-280B (few-shot, k=5)
Accuracy: 58.9
logical-reasoning-on-big-bench-penguins-in-aGopher-280B (few-shot, k=5)
Accuracy: 40.6
logical-reasoning-on-big-bench-reasoningGopher-280B (few-shot, k=5)
Accuracy: 49.2
logical-reasoning-on-big-bench-strategyqaGopher-280B (few-shot, k=5)
Accuracy: 61.0
logical-reasoning-on-big-bench-temporalGopher-280B (few-shot, k=5)
Accuracy: 19.0
machine-learning-on-big-benchGopher-280B (few-shot, k=5)
Accuracy: 41.1
management-on-big-benchGopher-280B (few-shot, k=5)
Accuracy : 77.7
marketing-on-big-benchGopher-280B (few-shot, k=5)
Accuracy: 83.3
mathematical-induction-on-big-benchGopher-280B (few-shot, k=5)
Accuracy : 57.6
medical-genetics-on-big-benchGopher-280B (few-shot, k=5)
Accuracy: 69.0
memorization-on-big-bench-hindu-knowledgeGopher-280B (few-shot, k=5)
Accuracy: 80
metaphor-boolean-on-big-benchGopher-280B (few-shot, k=5)
Accuracy: 59.3
miscellaneous-on-big-benchGopher-280B (few-shot, k=5)
Accuracy: 75.7
misconceptions-on-big-benchGopher-280B (few-shot, k=5)
Accuracy: 61.7
moral-disputes-on-big-benchGopher-280B (few-shot, k=5)
Accuracy: 66.8
moral-permissibility-on-big-benchGopher-280B (few-shot, k=5)
Accuracy: 55.1
moral-scenarios-on-big-benchGopher-280B (few-shot, k=5)
Accuracy: 40.2
movie-dialog-same-or-different-on-big-benchGopher-280B (few-shot, k=5)
Accuracy : 50.7
multi-task-language-understanding-on-mmluGopher 7.1B (5-shot)
Average (%): 29.5
multiple-choice-question-answering-mcqa-on-27Gopher-280B (few-shot, k=5)
Accuracy: 51.7
multiple-choice-question-answering-mcqa-on-28Gopher-280B (few-shot, k=5)
Accuracy: 50.5
multiple-choice-question-answering-mcqa-on-29Gopher-280B (few-shot, k=5)
Accuracy: 51.1
multiple-choice-question-answering-mcqa-on-30Gopher-280B (few-shot, k=5)
Accuracy: 38.6
multiple-choice-question-answering-mcqa-on-31Gopher-280B (few-shot, k=5)
Accuracy: 59.1
natural-questions-on-big-benchGopher-280B (few-shot, k=64)
Accuracy: 28.2
nonsense-words-grammar-on-big-benchGopher-280B (few-shot, k=5)
Accuracy: 61.4
nutrition-on-big-benchGopher-280B (few-shot, k=5)
Accuracy : 69.9
odd-one-out-on-big-benchGopher-280B (few-shot, k=5)
Accuracy: 32.5
philosophy-on-big-benchGopher-280B (few-shot, k=5)
Accuracy: 68.8
phrase-relatedness-on-big-benchGopher-280B (few-shot, k=5)
Accuracy : 81.8
physical-intuition-on-big-benchGopher-280B (few-shot, k=5)
Accuracy: 59.7
physics-mc-on-big-benchGopher-280B (few-shot, k=5)
Accuracy : 50.9
prehistory-on-big-benchGopher-280B (few-shot, k=5)
Accuracy: 67.6
presuppositions-as-nli-on-big-benchGopher-280B (few-shot, k=5)
Accuracy: 34.0
professional-accounting-on-big-benchGopher-280B (few-shot, k=5)
Accuracy : 44.3
professional-law-on-big-benchGopher-280B (few-shot, k=5)
Accuracy: 44.5
professional-medicine-on-big-benchGopher-280B (few-shot, k=5)
Accuracy: 64.0
professional-psychology-on-big-benchGopher-280B (few-shot, k=5)
Accuracy : 68.1
public-relations-on-big-benchGopher-280B (few-shot, k=5)
Accuracy : 71.8
question-answering-on-boolqGopher (zero-shot)
Accuracy: 79.3
question-answering-on-natural-questionsGopher (few-shot, k=64)
EM: 28.2
question-answering-on-piqaGopher 280B (0-shot)
Accuracy: 81.8
question-answering-on-social-iqaGopher (zero-shot)
Accuracy: 50.6
question-answering-on-truthfulqaGopher 280B (zero-shot, QA prompts)
MC1: 0. 27
question-answering-on-truthfulqaGopher 7.1 (zero-shot, QA prompts)
MC1: 0.25
question-answering-on-truthfulqaGopher 7.1B (zero-shot, Our Prompt + Choices)
MC1: 0.23
question-answering-on-truthfulqaGopher 1.4 (zero-shot, QA prompts)
MC1: 0.23
question-answering-on-truthfulqaGopher 280B (zero-shot, Our Prompt + Choices)
MC1: 0.295
question-answering-on-truthfulqaGopher 1.4B (zero-shot, Our Prompt + Choices)
MC1: 0.217
question-selection-on-big-benchGopher-280B (few-shot, k=5)
Accuracy: 41.4
race-h-on-big-benchGopher-280B (few-shot, k=5)
Accuracy: 71.6
race-m-on-big-benchGopher-280B (few-shot, k=5)
Accuracy : 75.1
riddle-sense-on-big-benchGopher-280B (few-shot, k=5)
Accuracy: 68.2
sarcasm-detection-on-big-bench-snarksGopher-280B (few-shot, k=5)
Accuracy: 48.3
security-studies-on-big-benchGopher-280B (few-shot, k=5)
Accuracy : 64.9
sentence-ambiguity-on-big-benchGopher-280B (few-shot, k=5)
Accuracy: 69.1
similarities-abstraction-on-big-benchGopher-280B (few-shot, k=5)
Accuracy: 81.8
sociology-on-big-benchGopher-280B (few-shot, k=5)
Accuracy : 84.1
timedial-on-big-benchGopher-280B (few-shot, k=5)
Accuracy: 50.9
triviaqa-on-big-benchGopher-280B (few-shot, k=64)
Accuracy: 57.1
understanding-fables-on-big-benchGopher-280B (few-shot, k=5)
Accuracy : 39.6
us-foreign-policy-on-big-benchGopher-280B (few-shot, k=5)
Accuracy : 81.0
virology-on-big-benchGopher-280B (few-shot, k=5)
Accuracy: 47.0
word-sense-disambiguation-on-big-benchGopher-280B (few-shot, k=5)
Accuracy: 56.4
world-religions-on-big-benchGopher-280B (few-shot, k=5)
Accuracy: 84.2

用 AI 构建 AI

从想法到上线——通过免费 AI 协同编程、开箱即用的环境和市场最优价格的 GPU 加速您的 AI 开发

AI 协同编程
即用型 GPU
最优价格
立即开始

Hyper Newsletters

订阅我们的最新资讯
我们会在北京时间 每周一的上午九点 向您的邮箱投递本周内的最新更新
邮件发送服务由 MailChimp 提供
扩展语言模型:训练Gopher的方法、分析与洞见 | 论文 | HyperAI超神经