4 个月前

PaLM 2 技术报告

PaLM 2 技术报告

摘要

请提供需要翻译的英文内容。

代码仓库

基准测试

基准方法指标
arithmetic-reasoning-on-gsm8kPaLM 2 (few-shot, k=8, CoT)
Accuracy: 80.7
arithmetic-reasoning-on-gsm8kPaLM 2 (few-shot, k=8, SC)
Accuracy: 91.0
code-generation-on-mbppPaLM 2-S* (few-shot)
Accuracy: 50
common-sense-reasoning-on-arc-challengePaLM 2-S (1-shot)
Accuracy: 59.6
common-sense-reasoning-on-arc-challengePaLM 2 (few-shot, CoT, SC)
Accuracy: 95.1
common-sense-reasoning-on-arc-challengePaLM 2-M (1-shot)
Accuracy: 64.9
common-sense-reasoning-on-arc-challengePaLM 2-L (1-shot)
Accuracy: 69.2
common-sense-reasoning-on-arc-easyPaLM 2-S (1-shot)
Accuracy: 85.6
common-sense-reasoning-on-arc-easyPaLM 2-L (1-shot)
Accuracy: 89.7
common-sense-reasoning-on-arc-easyPaLM 2-M (1-shot)
Accuracy: 88.0
common-sense-reasoning-on-big-benchPaLM 2 (few-shot, k=3, CoT)
Accuracy: 77.6
common-sense-reasoning-on-big-benchPaLM 2 (few-shot, k=3, Direct)
Accuracy: 78.8
common-sense-reasoning-on-big-bench-causalPaLM 2 (few-shot, k=3, Direct)
Accuracy: 62.0
common-sense-reasoning-on-big-bench-causalPaLM 2 (few-shot, k=3, CoT)
Accuracy: 58.8
common-sense-reasoning-on-big-bench-datePaLM 2 (few-shot, k=3, CoT)
Accuracy: 91.2
common-sense-reasoning-on-big-bench-datePaLM 2 (few-shot, k=3, Direct)
Accuracy: 74.0
common-sense-reasoning-on-big-bench-sportsPaLM 2(few-shot, k=3, CoT)
Accuracy: 98
common-sense-reasoning-on-big-bench-sportsPaLM 2 (few-shot, k=3, Direct)
Accuracy: 90.8
common-sense-reasoning-on-commonsenseqaPaLM 2 (few‑shot, CoT, SC)
Accuracy: 90.4
common-sense-reasoning-on-recordPaLM 2-L (one-shot)
F1: 93.8
common-sense-reasoning-on-recordPaLM 2-M (one-shot)
F1: 92.4
common-sense-reasoning-on-recordPaLM 2-S (one-shot)
F1: 92.1
common-sense-reasoning-on-winograndePaLM 2-S (1-shot)
Accuracy: 77.9
common-sense-reasoning-on-winograndePaLM 2-M (1-shot)
Accuracy: 79.2
common-sense-reasoning-on-winograndePaLM 2-L (1-shot)
Accuracy: 83.0
coreference-resolution-on-winograd-schemaPaLM 2-M (1-shot)
Accuracy: 88.1
coreference-resolution-on-winograd-schemaPaLM 2-S (1-shot)
Accuracy: 84.6
coreference-resolution-on-winograd-schemaPaLM 2-L (1-shot)
Accuracy: 86.9
cross-lingual-question-answering-on-tydiqaPaLM 2-M (one-shot)
F1: 73.3
cross-lingual-question-answering-on-tydiqaPaLM 2-S (one-shot)
F1: 73.3
cross-lingual-question-answering-on-tydiqaPaLM 2-L (one-shot)
F1: 73.6
cross-lingual-transfer-on-xcopaPaLM 2 (few-shot)
Accuracy: 94.4
language-modelling-on-lambadaPaLM 2-M (one-shot)
Accuracy: 83.7
language-modelling-on-lambadaPaLM 2-L (one-shot)
Accuracy: 86.9
language-modelling-on-lambadaPaLM 2-S (one-shot)
Accuracy: 80.7
logical-reasoning-on-big-bench-formalPaLM 2 (few-shot, k=3, Direct)
Accuracy: 64.8
logical-reasoning-on-big-bench-formalPaLM 2 (few-shot, k=3, CoT)
Accuracy: 57.2
logical-reasoning-on-big-bench-logic-gridPaLM-62B (few-shot, k=5)
Accuracy: 36.5
logical-reasoning-on-big-bench-logic-gridPaLM-540B (few-shot, k=5)
Accuracy: 42.4
logical-reasoning-on-big-bench-penguins-in-aPaLM 2 (few-shot, k=3, CoT)
Accuracy: 84.9
logical-reasoning-on-big-bench-penguins-in-aPaLM 2 (few-shot, k=3, Direct)
Accuracy: 65.8
logical-reasoning-on-big-bench-reasoningPaLM 2 (few-shot, k=3, Direct)
Accuracy: 61.2
logical-reasoning-on-big-bench-reasoningPaLM 2 (few-shot, k=3, CoT)
Accuracy: 91.2
logical-reasoning-on-big-bench-temporalPaLM 2 (few-shot, k=3, CoT)
Accuracy: 100
logical-reasoning-on-big-bench-temporalPaLM 2 (few-shot, k=3, Direct)
Accuracy: 96.4
machine-translation-on-frmt-chinese-mainlandGoogle Translate
BLEURT: 72.3
machine-translation-on-frmt-chinese-mainlandPaLM 2
BLEURT: 74.4
machine-translation-on-frmt-chinese-mainlandPaLM
BLEURT: 70.3
machine-translation-on-frmt-chinese-taiwanPaLM 2
BLEURT: 72.0
machine-translation-on-frmt-chinese-taiwanGoogle Translate
BLEURT: 68.5
machine-translation-on-frmt-chinese-taiwanPaLM
BLEURT: 68.6
machine-translation-on-frmt-portuguesePaLM 2
BLEURT: 78.3
machine-translation-on-frmt-portuguesePaLM
BLEURT: 76.1
machine-translation-on-frmt-portugueseGoogle Translate
BLEURT: 75.3
machine-translation-on-frmt-portuguese-brazilGoogle Translate
BLEURT: 80.2
machine-translation-on-frmt-portuguese-brazilPaLM
BLEURT: 78.5
machine-translation-on-frmt-portuguese-brazilPaLM 2
BLEURT: 81.1
math-word-problem-solving-on-mathPaLM 2 (few-shot, k=4, CoT)
Accuracy: 34.3
math-word-problem-solving-on-mathPaLM 2 (few-shot, k=4, SC)
Accuracy: 48.8
multi-task-language-understanding-on-mgsmPaLM 2 (few-shot, k=8, SC)
Average (%): 87.0
multi-task-language-understanding-on-mgsmPaLM 2 (8-shot, CoT)
Average (%): 72.2
multiple-choice-question-answering-mcqa-on-27PaLM 2 (few-shot, k=3, Direct)
Accuracy: 84.8
multiple-choice-question-answering-mcqa-on-27PaLM 2 (few-shot, k=3, CoT)
Accuracy: 82.4
multiple-choice-question-answering-mcqa-on-28PaLM 2 (few-shot, k=3, Direct)
Accuracy: 93.6
multiple-choice-question-answering-mcqa-on-28PaLM 2 (few-shot, k=3, CoT)
Accuracy: 94.4
multiple-choice-question-answering-mcqa-on-29PaLM 2 (few-shot, k=3, Direct)
Accuracy: 68.8
multiple-choice-question-answering-mcqa-on-29PaLM 2 (few-shot, k=3, CoT)
Accuracy: 91.2
multiple-choice-question-answering-mcqa-on-30PaLM 2 (few-shot, k=3, CoT)
Accuracy: 83.6
multiple-choice-question-answering-mcqa-on-30PaLM 2 (few-shot, k=3, Direct)
Accuracy: 90
natural-language-inference-on-anli-testPaLM 2-S (one-shot)
A1: 53.1
A2: 48.8
A3: 53.2
natural-language-inference-on-anli-testPaLM 2-L (one-shot)
A1: 73.1
A2: 63.4
A3: 67.1
natural-language-inference-on-anli-testPaLM 2-M (one-shot)
A1: 58.1
A2: 49.5
A3: 54.5
natural-language-inference-on-commitmentbankPaLM 2-S (one-shot)
Accuracy: 82.1
natural-language-inference-on-commitmentbankPaLM 2-M (one-shot)
Accuracy: 80.4
natural-language-inference-on-commitmentbankPaLM 2-L (one-shot)
Accuracy: 87.5
natural-language-inference-on-rtePaLM 2-L (1-shot)
Accuracy: 79.3%
natural-language-inference-on-rtePaLM 2-S (1-shot)
Accuracy: 78.7%
natural-language-inference-on-rtePaLM 2-M (1-shot)
Accuracy: 81.9%
question-answering-on-boolqPaLM 2-S (1-shot)
Accuracy: 88.1
question-answering-on-boolqPaLM 2-L (1-shot)
Accuracy: 90.9
question-answering-on-boolqPaLM 2-M (1-shot)
Accuracy: 88.6
question-answering-on-copaPaLM 2-M (1-shot)
Accuracy: 90.0
question-answering-on-copaPaLM 2-S (1-shot)
Accuracy: 89.0
question-answering-on-copaPaLM 2-L (1-shot)
Accuracy: 96.0
question-answering-on-drop-testPaLM 2 (few-shot)
F1: 85.0
question-answering-on-multircPaLM 2-S (one-shot)
F1: 84.0
question-answering-on-multircPaLM 2-M (one-shot)
F1: 84.1
question-answering-on-multircPaLM 2-L (one-shot)
F1: 88.2
question-answering-on-natural-questionsPaLM 2-S (one-shot)
EM: 25.3
question-answering-on-natural-questionsPaLM 2-M (one-shot)
EM: 32.0
question-answering-on-natural-questionsPaLM 2-L (one-shot)
EM: 37.5
question-answering-on-openbookqaPaLM 2-L (1-shot)
Accuracy: 58.5
question-answering-on-openbookqaPaLM 2-M (1-shot)
Accuracy: 56.2
question-answering-on-openbookqaPaLM 2-S (1-shot)
Accuracy: 57.4
question-answering-on-piqaPaLM 2-M (1-shot)
Accuracy: 83.2
question-answering-on-piqaPaLM 2-S (1-shot)
Accuracy: 82.2
question-answering-on-piqaPaLM 2-L (1-shot)
Accuracy: 85.0
question-answering-on-story-clozePaLM 2-M (one-shot)
Accuracy: 86.7
question-answering-on-story-clozePaLM 2-S (one-shot)
Accuracy: 85.6
question-answering-on-story-clozePaLM 2-L (one-shot)
Accuracy: 87.4
question-answering-on-strategyqaPaLM 2 (few-shot, CoT, SC)
Accuracy: 90.4
question-answering-on-triviaqaPaLM 2-S (one-shot)
EM: 75.2
question-answering-on-triviaqaPaLM 2-M (one-shot)
EM: 81.7
question-answering-on-triviaqaPaLM 2-L (one-shot)
EM: 86.1
question-answering-on-webquestionsPaLM 2-S (one-shot)
EM: 21.8
question-answering-on-webquestionsPaLM 2-L (one-shot)
EM: 28.2
question-answering-on-webquestionsPaLM 2-M (one-shot)
EM: 26.9
sarcasm-detection-on-big-bench-snarksPaLM 2 (few-shot, k=3, Direct)
Accuracy: 78.7
sarcasm-detection-on-big-bench-snarksPaLM 2(few-shot, k=3, CoT)
Accuracy: 84.8
text-summarization-on-x-sumPaLM 2-S (one-shot)
ROUGE-2: 16.9
text-summarization-on-x-sumPaLM 2-L (one-shot)
ROUGE-2: 23.2
text-summarization-on-x-sumPaLM 2-M (one-shot)
ROUGE-2: 17.2
toxic-comment-classification-on-civilPaLM 2 (zero-shot)
AUROC: 0.7596
toxic-comment-classification-on-civilPaLM 2 (few-shot, k=10)
AUROC: 0.8535
word-sense-disambiguation-on-words-in-contextPaLM 2-L (one-shot)
Accuracy: 66.8
word-sense-disambiguation-on-words-in-contextPaLM 2-S (one-shot)
Accuracy: 50.6
word-sense-disambiguation-on-words-in-contextPaLM 2-M (one-shot)
Accuracy: 52.0

用 AI 构建 AI

从想法到上线——通过免费 AI 协同编程、开箱即用的环境和市场最优价格的 GPU 加速您的 AI 开发

AI 协同编程
即用型 GPU
最优价格
立即开始

Hyper Newsletters

订阅我们的最新资讯
我们会在北京时间 每周一的上午九点 向您的邮箱投递本周内的最新更新
邮件发送服务由 MailChimp 提供
PaLM 2 技术报告 | 论文 | HyperAI超神经