4 个月前

训练计算最优的大规模语言模型

训练计算最优的大规模语言模型

摘要

我们研究了在给定计算预算下训练变压器语言模型的最佳模型规模和训练令牌数量。研究发现,当前的大规模语言模型显著欠训,这是由于近期研究重点放在扩大语言模型规模的同时保持训练数据量不变所致。通过训练超过400个参数范围从7000万到160亿以上的语言模型,以及50亿到5000亿的训练令牌,我们发现对于计算最优的训练,模型规模和训练令牌数量应该等比例扩展:每次模型规模翻倍时,训练令牌数量也应翻倍。为了验证这一假设,我们使用与Gopher相同的计算预算训练了一个预测的计算最优模型Chinchilla,该模型具有700亿参数和4倍于Gopher的数据量。Chinchilla在广泛的下游评估任务中显著且一致地优于Gopher(280亿参数)、GPT-3(175亿参数)、Jurassic-1(178亿参数)和Megatron-Turing NLG(530亿参数)。这也意味着Chinchilla在微调和推理过程中使用的计算资源大大减少,极大地促进了下游应用。值得一提的是,Chinchilla在MMLU基准测试中达到了67.5%的平均准确率,比Gopher提高了超过7个百分点。

代码仓库

karpathy/llama2.c
pytorch
GitHub 中提及
nkluge-correa/teenytinyllama
pytorch
GitHub 中提及

基准测试

基准方法指标
analogical-similarity-on-big-benchChinchilla-70B (few-shot, k=5)
Accuracy: 38.1
analytic-entailment-on-big-benchChinchilla-70B (few-shot, k=5)
Accuracy: 67.1
common-sense-reasoning-on-big-benchChinchilla-70B (few-shot, k=5)
Accuracy: 54.7
common-sense-reasoning-on-big-bench-causalChinchilla-70B (few-shot, k=5)
Accuracy: 57.4
common-sense-reasoning-on-big-bench-dateChinchilla-70B (few-shot, k=5)
Accuracy: 52.3
common-sense-reasoning-on-big-bench-knownChinchilla-70B (few-shot, k=5)
Accuracy: 65.2
common-sense-reasoning-on-big-bench-logicalChinchilla-70B (few-shot, k=5)
Accuracy: 64.1
common-sense-reasoning-on-big-bench-sportsChinchilla-70B (few-shot, k=5)
Accuracy: 71
common-sense-reasoning-on-big-bench-winowhyChinchilla-70B (few-shot, k=5)
Accuracy: 62.5
common-sense-reasoning-on-winograndeChinchilla 70B (0-shot)
Accuracy: 74.9
crash-blossom-on-big-benchChinchilla-70B (few-shot, k=5)
Accuracy : 47.6
crass-ai-on-big-benchChinchilla-70B (few-shot, k=5)
Accuracy: 75.0
dark-humor-detection-on-big-benchChinchilla-70B (few-shot, k=5)
Accuracy: 66.2
discourse-marker-prediction-on-big-benchChinchilla-70B (few-shot, k=5)
Accuracy: 13.1
empirical-judgments-on-big-benchChinchilla-70B (few-shot, k=5)
Accuracy: 67.7
english-proverbs-on-big-benchChinchilla-70B (few-shot, k=5)
Accuracy : 82.4
entailed-polarity-on-big-benchChinchilla-70B (few-shot, k=5)
Accuracy: 94
epistemic-reasoning-on-big-benchChinchilla-70B (few-shot, k=5)
Accuracy: 60.6
evaluating-information-essentiality-on-bigChinchilla-70B (few-shot, k=5)
Accuracy: 17.6
fantasy-reasoning-on-big-benchChinchilla-70B (few-shot, k=5)
Accuracy : 69
figure-of-speech-detection-on-big-benchChinchilla-70B (few-shot, k=5)
Accuracy : 63.3
general-knowledge-on-big-benchChinchilla-70B (few-shot, k=5)
Accuracy: 94.3
gre-reading-comprehension-on-big-benchChinchilla-70B (few-shot, k=5)
Accuracy : 53.1
human-organs-senses-multiple-choice-on-bigChinchilla-70B (few-shot, k=5)
Accuracy : 85.7
identify-odd-metapor-on-big-benchChinchilla-70B (few-shot, k=5)
Accuracy: 68.8
implicatures-on-big-benchChinchilla-70B (few-shot, k=5)
Accuracy: 75
implicit-relations-on-big-benchChinchilla-70B (few-shot, k=5)
Accuracy : 49.4
intent-recognition-on-big-benchChinchilla-70B (few-shot, k=5)
Accuracy : 92.8
irony-identification-on-big-benchChinchilla-70B (few-shot, k=5)
Accuracy: 73.0
lambada-on-big-benchChinchilla-70B (zero-shot)
Accuracy : 77.4
language-modelling-on-lambadaChinchilla (Zero-Shot)
Accuracy: 77.7
logical-args-on-big-benchChinchilla-70B (few-shot, k=5)
Accuracy : 56.2
logical-reasoning-on-big-bench-formalChinchilla-70B (few-shot, k=5)
Accuracy: 52.1
logical-reasoning-on-big-bench-logic-gridChinchilla-70B (few-shot, k=5)
Accuracy: 44
logical-reasoning-on-big-bench-logicalChinchilla-70B (few-shot, k=5)
Accuracy: 72.1
logical-reasoning-on-big-bench-penguins-in-aChinchilla-70B (few-shot, k=5)
Accuracy: 48.7
logical-reasoning-on-big-bench-reasoningChinchilla-70B (few-shot, k=5)
Accuracy: 59.7
logical-reasoning-on-big-bench-strategyqaChinchilla-70B (few-shot, k=5)
Accuracy: 68.3
logical-reasoning-on-big-bench-temporalChinchilla-70B (few-shot, k=5)
Accuracy: 32.0
mathematical-induction-on-big-benchChinchilla-70B (few-shot, k=5)
Accuracy : 47.3
metaphor-boolean-on-big-benchChinchilla-70B (few-shot, k=5)
Accuracy: 93.1
misconceptions-on-big-benchChinchilla-70B (few-shot, k=5)
Accuracy: 65.3
moral-permissibility-on-big-benchChinchilla-70B (few-shot, k=5)
Accuracy: 57.3
movie-dialog-same-or-different-on-big-benchChinchilla-70B (few-shot, k=5)
Accuracy : 54.5
multi-task-language-understanding-on-mmluchatgpt/gpt3.5(20B)
Average (%): 67.5
multiple-choice-question-answering-mcqa-on-27Chinchilla-70B (few-shot, k=5)
Accuracy: 54.2
multiple-choice-question-answering-mcqa-on-28Chinchilla-70B (few-shot, k=5)
Accuracy: 75.6
multiple-choice-question-answering-mcqa-on-29Chinchilla-70B (few-shot, k=5)
Accuracy: 52.6
multiple-choice-question-answering-mcqa-on-30Chinchilla-70B (few-shot, k=5)
Accuracy: 47.1
multiple-choice-question-answering-mcqa-on-31Chinchilla-70B (few-shot, k=5)
Accuracy: 65.6
nonsense-words-grammar-on-big-benchChinchilla-70B (few-shot, k=5)
Accuracy: 78
odd-one-out-on-big-benchChinchilla-70B (few-shot, k=5)
Accuracy: 70.9
phrase-relatedness-on-big-benchChinchilla-70B (few-shot, k=5)
Accuracy : 94
physical-intuition-on-big-benchChinchilla-70B (few-shot, k=5)
Accuracy: 79
physics-mc-on-big-benchChinchilla-70B (few-shot, k=5)
Accuracy : 65.5
presuppositions-as-nli-on-big-benchChinchilla-70B (few-shot, k=5)
Accuracy: 49.9
question-answering-on-boolqChinchilla 70B (0-shot)
Accuracy: 83.7
question-answering-on-natural-questionsChinchilla (few-shot, k=64)
EM: 35.5
question-answering-on-piqaChinchilla 70B (0-shot)
Accuracy: 81.8
question-answering-on-social-iqaChinchilla (zero-shot)
Accuracy: 51.3
question-selection-on-big-benchChinchilla-70B (few-shot, k=5)
Accuracy: 52.6
riddle-sense-on-big-benchChinchilla-70B (few-shot, k=5)
Accuracy: 85.7
sarcasm-detection-on-big-bench-snarksChinchilla-70B (few-shot, k=5)
Accuracy: 58.6
sentence-ambiguity-on-big-benchChinchilla-70B (few-shot, k=5)
Accuracy: 71.7
similarities-abstraction-on-big-benchChinchilla-70B (few-shot, k=5)
Accuracy: 87
timedial-on-big-benchChinchilla-70B (few-shot, k=5)
Accuracy: 68.8
understanding-fables-on-big-benchChinchilla-70B (few-shot, k=5)
Accuracy : 60.3
word-sense-disambiguation-on-big-benchChinchilla-70B (few-shot, k=5)
Accuracy: 69.1

用 AI 构建 AI

从想法到上线——通过免费 AI 协同编程、开箱即用的环境和市场最优价格的 GPU 加速您的 AI 开发

AI 协同编程
即用型 GPU
最优价格
立即开始

Hyper Newsletters

订阅我们的最新资讯
我们会在北京时间 每周一的上午九点 向您的邮箱投递本周内的最新更新
邮件发送服务由 MailChimp 提供
训练计算最优的大规模语言模型 | 论文 | HyperAI超神经