4 个月前

Galactica:用于科学的大型语言模型

Galactica:用于科学的大型语言模型

摘要

信息过载是科学进步的主要障碍。科学文献和数据的爆炸性增长使得在大量信息中发现有用见解变得越来越困难。目前,科学知识主要通过搜索引擎获取,但仅靠搜索引擎无法有效组织这些知识。本文介绍了一种名为Galactica的大规模语言模型,该模型能够存储、整合和推理科学知识。我们使用了包含论文、参考材料、知识库和其他多种来源的大型科学语料库进行训练。在一系列科学任务上,我们的表现优于现有模型。例如,在LaTeX公式等技术知识测试中,Galactica的表现比最新的GPT-3高出68.2%(而GPT-3为49.0%)。Galactica在推理能力方面也表现出色,在数学MMLU测试中以41.3%的成绩超过了Chinchilla的35.7%,并且在MATH测试中的得分(20.4%)远高于PaLM 540B的8.8%。此外,它还在PubMedQA和MedMCQA开发集等下游任务上取得了新的最佳成绩,分别为77.6%和52.9%。尽管没有经过通用语料库的训练,Galactica在BIG-bench上的表现仍优于BLOOM和OPT-175B。我们认为这些结果展示了语言模型作为科学研究新界面的巨大潜力。为了造福科学界,我们开源了该模型。

代码仓库

基准测试

基准方法指标
bias-detection-on-stereoset-1OPT 175B
ICAT Score: 60
LMS: 74.8
SS: 59.9
bias-detection-on-stereoset-1GAL 120B
ICAT Score: 65.6
LMS: 75
SS: 56.2
bias-detection-on-stereoset-1GPT-3 (text-davinci-002)
ICAT Score: 60.8
LMS: 77.6
SS: 60.8
common-sense-reasoning-on-arc-challengeBLOOM (few-shot, k=5)
Accuracy: 32.9
common-sense-reasoning-on-arc-challengeGAL 120B (zero-shot)
Accuracy: 67.9
common-sense-reasoning-on-arc-challengeOPT (few-shot, k=5)
Accuracy: 31.1
common-sense-reasoning-on-arc-challengeGPT-3 (zero-shot)
Accuracy: 51.4
common-sense-reasoning-on-arc-easyGAL 120B (0-shot)
Accuracy: 83.8
common-sense-reasoning-on-arc-easyBLOOM (5-shot)
Accuracy: 40.7
common-sense-reasoning-on-arc-easyGPT-3 (zero-shot)
Accuracy: 68.8
common-sense-reasoning-on-arc-easyOPT (5-shot)
Accuracy: 37.4
math-word-problem-solving-on-mathGAL 120B <work>
Accuracy: 16.6
Parameters (Billions): 120
math-word-problem-solving-on-mathGAL 120B (5-shot) mCoT
Accuracy: 20.4
Parameters (Billions): 120
math-word-problem-solving-on-mathMinerva 540B (5-shot) mCoT
Accuracy: 33.6
Parameters (Billions): 540
math-word-problem-solving-on-mathGAL 30B <work>
Accuracy: 11.4
Parameters (Billions): 30
math-word-problem-solving-on-mathPaLM 540B (5-shot) mCoT
Accuracy: 8.8
Parameters (Billions): 540
math-word-problem-solving-on-mathGPT-3 175B (8-shot)
Accuracy: 5.2
Parameters (Billions): 175
math-word-problem-solving-on-mathGAL 30B (5-shot) mCoT
Accuracy: 12.7
Parameters (Billions): 30
mathematical-reasoning-on-mmlu-mathematicsGAL 120B <work>
Accuracy: 41.3
molecular-property-prediction-on-bace-1GAL 1.3B
ROC-AUC: 57.6
molecular-property-prediction-on-bace-1GAL 30B
ROC-AUC: 72.7
molecular-property-prediction-on-bace-1GAL 125M
ROC-AUC: 56.1
molecular-property-prediction-on-bace-1GAL 120B
ROC-AUC: 61.7
molecular-property-prediction-on-bace-1GAL 6.7B
ROC-AUC: 58.4
molecular-property-prediction-on-bbbp-1GAL 6.7B
ROC-AUC: 53.5
molecular-property-prediction-on-bbbp-1GAL 125M
ROC-AUC: 39.3
molecular-property-prediction-on-bbbp-1GAL 120B
ROC-AUC: 66.1
molecular-property-prediction-on-bbbp-1Uni-Mol
ROC-AUC: 72.9
molecular-property-prediction-on-bbbp-1GAL 30B
ROC-AUC: 59.6
molecular-property-prediction-on-bbbp-1GAL 1.3B
ROC-AUC: 60.4
molecular-property-prediction-on-clintox-1GAL 1.3B
Molecules (M): 2
ROC-AUC: 58.9
molecular-property-prediction-on-clintox-1GAL 125M
Molecules (M): 2
ROC-AUC: 51.8
molecular-property-prediction-on-clintox-1GAL 120B
Molecules (M): 2
ROC-AUC: 82.6
molecular-property-prediction-on-clintox-1GAL 6.7B
Molecules (M): 2
ROC-AUC: 78.4
molecular-property-prediction-on-clintox-1GAL 30B
Molecules (M): 2
ROC-AUC: 82.2
molecular-property-prediction-on-hiv-datasetGAL 30B
AUC: 0.759
molecular-property-prediction-on-hiv-datasetGAL 1.3B
AUC: 0.724
molecular-property-prediction-on-hiv-datasetGAL 125M
AUC: 0.702
molecular-property-prediction-on-hiv-datasetGAL 6.7B
AUC: 0.722
molecular-property-prediction-on-hiv-datasetUni-Mol
AUC: 0.808
molecular-property-prediction-on-hiv-datasetGAL 120B
AUC: 0.745
molecular-property-prediction-on-moleculenetGAL 30B
AUC: 0.69
molecular-property-prediction-on-moleculenetGAL 125M
AUC: 0.581
molecular-property-prediction-on-moleculenetGAL 1.3B
AUC: 0.619
molecular-property-prediction-on-moleculenetGAL 6.7B
AUC: 0.64
molecular-property-prediction-on-moleculenetUni-Mol
AUC: 0.77
molecular-property-prediction-on-sider-1GAL 125M
ROC-AUC: 55.9
molecular-property-prediction-on-sider-1GAL 1.3B
ROC-AUC: 54.0
molecular-property-prediction-on-sider-1GAL 6.7B
ROC-AUC: 55.9
molecular-property-prediction-on-sider-1GAL 120B
ROC-AUC: 63.2
molecular-property-prediction-on-sider-1GAL 30B
ROC-AUC: 61.3
molecular-property-prediction-on-tox21-1GAL 125M
ROC-AUC: 54.3
molecular-property-prediction-on-tox21-1GAL 120B
ROC-AUC: 68.9
molecular-property-prediction-on-tox21-1Uni-Mol
ROC-AUC: 79.6
molecular-property-prediction-on-tox21-1GAL 6.7B
ROC-AUC: 63.9
molecular-property-prediction-on-tox21-1GAL 30B
ROC-AUC: 68.5
molecular-property-prediction-on-tox21-1GAL 1.3B
ROC-AUC: 60.6
multi-task-language-understanding-on-mmluGAL 120B (zero-shot)
Average (%): 52.6
multiple-choice-question-answering-mcqa-on-10BLOOM (few-shot, k=5)
Accuracy: 27.6
multiple-choice-question-answering-mcqa-on-10Gopher (few-shot, k=5)
Accuracy: 33.6
multiple-choice-question-answering-mcqa-on-10Chinchilla (few-shot, k=5)
Accuracy: 41.5
multiple-choice-question-answering-mcqa-on-10OPT (few-shot, k=5)
Accuracy: 25.7
multiple-choice-question-answering-mcqa-on-10GAL 120B (zero-shot)
Accuracy: 38.1
multiple-choice-question-answering-mcqa-on-11OPT (few-shot, k=5)
Accuracy: 30.6
multiple-choice-question-answering-mcqa-on-11GAL 120B (zero-shot)
Accuracy: 68.8
multiple-choice-question-answering-mcqa-on-11BLOOM (few-shot, k=5)
Accuracy: 28.5
multiple-choice-question-answering-mcqa-on-11Gopher (few-shot, k=5)
Accuracy: 70.8
multiple-choice-question-answering-mcqa-on-11Chinchilla (few-shot, k=5)
Accuracy: 79.9
multiple-choice-question-answering-mcqa-on-12OPT (few-shot, k=5)
Accuracy: 27.7
multiple-choice-question-answering-mcqa-on-12GAL 120B (zero-shot)
Accuracy: 69.4
multiple-choice-question-answering-mcqa-on-12Chinchilla (few-shot, k=5)
Accuracy: 80.3
multiple-choice-question-answering-mcqa-on-12BLOOM (few-shot, k=5)
Accuracy: 29.4
multiple-choice-question-answering-mcqa-on-12Gopher (few-shot, k=5)
Accuracy: 71.3
multiple-choice-question-answering-mcqa-on-13Chinchilla (few-shot, k=5)
Accuracy: 51
multiple-choice-question-answering-mcqa-on-13BLOOM (few-shot, k=5)
Accuracy: 19
multiple-choice-question-answering-mcqa-on-13GAL 120B (zero-shot)
Accuracy: 46
multiple-choice-question-answering-mcqa-on-13OPT (few-shot, k=5)
Accuracy: 30
multiple-choice-question-answering-mcqa-on-13Gopher (few-shot, k=5)
Accuracy: 45
multiple-choice-question-answering-mcqa-on-14BLOOM (few-shot, k=5)
Accuracy: 23.2
multiple-choice-question-answering-mcqa-on-14OPT (few-shot, k=5)
Accuracy: 21.7
multiple-choice-question-answering-mcqa-on-14GAL 120B (zero-shot)
Accuracy: 47.8
multiple-choice-question-answering-mcqa-on-14Chinchilla (few-shot, k=5)
Accuracy: 58.1
multiple-choice-question-answering-mcqa-on-15Chinchilla (few-shot, k=5)
Accuracy: 51.0
multiple-choice-question-answering-mcqa-on-15GAL 120B (zero-shot)
Accuracy: 49
multiple-choice-question-answering-mcqa-on-15BLOOM (few-shot, k=5)
Accuracy: 6.0
multiple-choice-question-answering-mcqa-on-15OPT (few-shot, k=5)
Accuracy: 17.0
multiple-choice-question-answering-mcqa-on-16Gopher (few-shot, k=5)
Accuracy: 23.7
multiple-choice-question-answering-mcqa-on-16Chinchilla (few-shot, k=5)
Accuracy: 31.9
multiple-choice-question-answering-mcqa-on-16BLOOM (few-shot, k=5)
Accuracy: 27
multiple-choice-question-answering-mcqa-on-16GAL 120B (zero-shot)
Accuracy: 32.6
multiple-choice-question-answering-mcqa-on-16OPT (few-shot, k=5)
Accuracy: 24.4
multiple-choice-question-answering-mcqa-on-17GAL 120B (zero-shot)
Accuracy: 62.8
multiple-choice-question-answering-mcqa-on-17BLOOM (few-shot, k=5)
Accuracy: 32.4
multiple-choice-question-answering-mcqa-on-17Gopher (few-shot, k=5)
Accuracy: 60
multiple-choice-question-answering-mcqa-on-17Chinchilla (few-shot, k=5)
Accuracy: 62.1
multiple-choice-question-answering-mcqa-on-17OPT (few-shot, k=5)
Accuracy: 36.6
multiple-choice-question-answering-mcqa-on-18GAL 120B (zero-shot)
Accuracy: 42.2
multiple-choice-question-answering-mcqa-on-18OPT (few-shot, k=5)
Accuracy: 21.6
multiple-choice-question-answering-mcqa-on-18Gopher (few-shot, k=5)
Accuracy: 34.3
multiple-choice-question-answering-mcqa-on-18BLOOM (few-shot, k=5)
Accuracy: 18.6
multiple-choice-question-answering-mcqa-on-18Chinchilla (few-shot, k=5)
Accuracy: 46.1
multiple-choice-question-answering-mcqa-on-19OPT (few-shot, k=5)
Accuracy: 29.8
multiple-choice-question-answering-mcqa-on-19GAL 120B (zero-shot)
Accuracy: 33.8
multiple-choice-question-answering-mcqa-on-19BLOOM (few-shot, k=5)
Accuracy: 25.2
multiple-choice-question-answering-mcqa-on-19Chinchilla (few-shot, k=5)
Accuracy: 36.4
multiple-choice-question-answering-mcqa-on-2Gopher (few-shot, k=5)
Accuracy: 35.7
multiple-choice-question-answering-mcqa-on-2GAL 120B (zero-shot)
Accuracy: 32.5
multiple-choice-question-answering-mcqa-on-2BLOOM (few-shot, k=5)
Accuracy: 26.2
multiple-choice-question-answering-mcqa-on-2OPT (few-shot, k=5)
Accuracy: 29.4
multiple-choice-question-answering-mcqa-on-2Chinchilla (few-shot, k=5)
Accuracy: 33.3
multiple-choice-question-answering-mcqa-on-20Gopher (few-shot, k=5)
Accuracy: 50
multiple-choice-question-answering-mcqa-on-20OPT (few-shot, k=5)
Accuracy: 43.5
multiple-choice-question-answering-mcqa-on-20Chinchilla (few-shot, k=5)
Accuracy: 58.8
multiple-choice-question-answering-mcqa-on-20GAL 120B (zero-shot)
Accuracy: 41.2
multiple-choice-question-answering-mcqa-on-20BLOOM (few-shot, k=5)
Accuracy: 19.4
multiple-choice-question-answering-mcqa-on-21OPT (few-shot, k=5)
Dev Set (Acc-%): 0.296
multiple-choice-question-answering-mcqa-on-21GAL 120B (zero-shot)
Dev Set (Acc-%): 0.529
multiple-choice-question-answering-mcqa-on-21BLOOM (few-shot, k=5)
Dev Set (Acc-%): 0.325
multiple-choice-question-answering-mcqa-on-3Gopher (few-shot, k=5)
Accuracy: 25
multiple-choice-question-answering-mcqa-on-3Chinchilla (few-shot, k=5)
Accuracy: 31
multiple-choice-question-answering-mcqa-on-3OPT (few-shot, k=5)
Accuracy: 21
multiple-choice-question-answering-mcqa-on-3GAL 120B (zero-shot)
Accuracy: 27
multiple-choice-question-answering-mcqa-on-3GAL 30B (zero-shot)
Accuracy: 33.3
multiple-choice-question-answering-mcqa-on-4GAL 120B (zero-shot)
Accuracy: 42.1
multiple-choice-question-answering-mcqa-on-4OPT (few-shot, k=5)
Accuracy: 21
multiple-choice-question-answering-mcqa-on-4BLOOM (few-shot, k=5)
Accuracy: 23.7
multiple-choice-question-answering-mcqa-on-4Chinchilla (few-shot, k=5)
Accuracy: 38.6
multiple-choice-question-answering-mcqa-on-4Gopher (few-shot, k=5)
Accuracy: 43
multiple-choice-question-answering-mcqa-on-5OPT (few-shot, k=5)
Accuracy: 30
multiple-choice-question-answering-mcqa-on-5Chinchilla (few-shot, k=5)
Accuracy: 58
multiple-choice-question-answering-mcqa-on-5Gopher (few-shot, k=5)
Accuracy: 54
multiple-choice-question-answering-mcqa-on-5GAL 120B (zero-shot)
Accuracy: 70
multiple-choice-question-answering-mcqa-on-5BLOOM (few-shot, k=5)
Accuracy: 25
multiple-choice-question-answering-mcqa-on-6Chinchilla (few-shot, k=5)
Accuracy: 41.1
multiple-choice-question-answering-mcqa-on-6GAL 120B (zero-shot)
Accuracy: 38.4
multiple-choice-question-answering-mcqa-on-6BLOOM (few-shot, k=5)
Accuracy: 25
multiple-choice-question-answering-mcqa-on-6OPT (few-shot, k=5)
Accuracy: 28.6
multiple-choice-question-answering-mcqa-on-7BLOOM (few-shot, k=5)
Accuracy: 25
multiple-choice-question-answering-mcqa-on-7Gopher (few-shot, k=5)
Accuracy: 37
multiple-choice-question-answering-mcqa-on-7Chinchilla (few-shot, k=5)
Accuracy: 32
multiple-choice-question-answering-mcqa-on-7GAL 120B (zero-shot)
Accuracy: 43
multiple-choice-question-answering-mcqa-on-7OPT (few-shot, k=5)
Accuracy: 33
multiple-choice-question-answering-mcqa-on-8BLOOM (few-shot, k=5)
Accuracy: 36
multiple-choice-question-answering-mcqa-on-8Chinchilla (few-shot, k=5)
Accuracy: 69
multiple-choice-question-answering-mcqa-on-8GAL 30B (zero-shot)
Accuracy: 70
multiple-choice-question-answering-mcqa-on-8GAL 120B (zero-shot)
Accuracy: 68
multiple-choice-question-answering-mcqa-on-8OPT (few-shot, k=5)
Accuracy: 35
multiple-choice-question-answering-mcqa-on-9BLOOM (few-shot, k=5)
Accuracy: 25.7
multiple-choice-question-answering-mcqa-on-9GAL 120B (zero-shot)
Accuracy: 65.1
multiple-choice-question-answering-mcqa-on-9Gopher (few-shot, k=5)
Accuracy: 65.8
multiple-choice-question-answering-mcqa-on-9OPT (few-shot, k=5)
Accuracy: 23.0
multiple-choice-question-answering-mcqa-on-9Chinchilla (few-shot, k=5)
Accuracy: 73.0
protein-function-prediction-on-caspsimseqGAL 1.3B
ROUGE-L: 0.069
protein-function-prediction-on-caspsimseqGAL 30B
ROUGE-L: 0.137
protein-function-prediction-on-caspsimseqGAL 120B
ROUGE-L: 0.252
protein-function-prediction-on-caspsimseqGAL 6.7B
ROUGE-L: 0.109
protein-function-prediction-on-caspsimseqGAL 125M
ROUGE-L: 0.062
protein-function-prediction-on-paenseqGAL 30B
ROUGE-L: 0.196
protein-function-prediction-on-paenseqGAL 120B
ROUGE-L: 0.272
protein-function-prediction-on-paenseqGAL 1.3B
ROUGE-L: 0.084
protein-function-prediction-on-paenseqGAL 125M
ROUGE-L: 0.073
protein-function-prediction-on-paenseqGAL 6.7B
ROUGE-L: 0.137
protein-function-prediction-on-uniprotseqGAL 30B
ROUGE-L: 0.186
protein-function-prediction-on-uniprotseqGAL 125M
ROUGE-L: 0.061
protein-function-prediction-on-uniprotseqGAL 120B
ROUGE-L: 0.252
protein-function-prediction-on-uniprotseqGAL 6.7B
ROUGE-L: 0.111
protein-function-prediction-on-uniprotseqGAL 1.3B
ROUGE-L: 0.079
protein-structure-prediction-on-caspseqGAL 6.7B
Validation perplexity: 17.29
protein-structure-prediction-on-caspseqGAL 1.3B
Validation perplexity: 17.58
protein-structure-prediction-on-caspseqGAL 30B
Validation perplexity: 17.27
protein-structure-prediction-on-caspseqGAL 125M
Validation perplexity: 20.62
protein-structure-prediction-on-caspseqGAL 120B
Validation perplexity: 17.26
protein-structure-prediction-on-caspsimseqGAL 1.3B
Validation perplexity: 17.04
protein-structure-prediction-on-caspsimseqGAL 30B
Validation perplexity: 15.42
protein-structure-prediction-on-caspsimseqGAL 125M
Validation perplexity: 19.18
protein-structure-prediction-on-caspsimseqGAL 6.7B
Validation perplexity: 16.35
protein-structure-prediction-on-caspsimseqGAL 120B
Validation perplexity: 12.77
protein-structure-prediction-on-paenseqGAL 30B
Validation perplexity: 4.28
protein-structure-prediction-on-paenseqGAL 6.7B
Validation perplexity: 7.76
protein-structure-prediction-on-paenseqGAL 120B
Validation perplexity: 3.14
protein-structure-prediction-on-paenseqGAL 1.3B
Validation perplexity: 12.53
protein-structure-prediction-on-paenseqGAL 125M
Validation perplexity: 16.35
protein-structure-prediction-on-uniprotseqGAL 6.7B
Validation perplexity: 11.58
protein-structure-prediction-on-uniprotseqGAL 125M
Validation perplexity: 19.05
protein-structure-prediction-on-uniprotseqGAL 1.3B
Validation perplexity: 15.82
protein-structure-prediction-on-uniprotseqGAL 120B
Validation perplexity: 5.54
protein-structure-prediction-on-uniprotseqGAL 30B
Validation perplexity: 8.23
question-answering-on-bioasqGAL 120B (zero-shot)
Accuracy: 94.3
question-answering-on-bioasqBLOOM (zero-shot)
Accuracy: 91.4
question-answering-on-bioasqOPT (zero-shot)
Accuracy: 81.4
question-answering-on-medqa-usmleGAL 120B (zero-shot)
Accuracy: 44.4
question-answering-on-medqa-usmleOPT (few-shot, k=5)
Accuracy: 22.8
question-answering-on-medqa-usmleBLOOM (few-shot, k=5)
Accuracy: 23.3
question-answering-on-pubmedqaGAL 120B (zero-shot)
Accuracy: 77.6
question-answering-on-pubmedqaBLOOM (zero-shot)
Accuracy: 73.6
question-answering-on-pubmedqaOPT (zero-shot)
Accuracy: 70.2
question-answering-on-truthfulqaGAL 6.7B
MC1: 0.19
question-answering-on-truthfulqaGAL 30B
MC1: 0.24
question-answering-on-truthfulqaGAL 1.3B
MC1: 0.19
question-answering-on-truthfulqaGAL 120B
MC1: 0.26
question-answering-on-truthfulqaGAL 125M
MC1: 0.19
question-answering-on-truthfulqaOPT 175B
MC1: 0.21
stereotypical-bias-analysis-on-crows-pairsGAL 120B
Age: 69
Disability: 66.7
Gender: 51.9
Nationality: 51.6
Overall: 60.5
Physical Appearance: 58.7
Race/Color: 59.9
Religion: 51.9
Sexual Orientation: 77.4
Socioeconomic status: 65.7
tdc-admet-benchmarking-group-on-tdcommonsGalactica-GAL-120B
TDC.BBB_Martins: 0.661
tdc-admet-benchmarking-group-on-tdcommonsGalactica-GAL-125M
TDC.BBB_Martins: 0.393
tdc-admet-benchmarking-group-on-tdcommonsGalactica-GAL-30B
TDC.BBB_Martins: 0.596
tdc-admet-benchmarking-group-on-tdcommonsGalactica-GAL-6.7B
TDC.BBB_Martins: 0.535
tdc-admet-benchmarking-group-on-tdcommonsGalactica-GAL-1.3B
TDC.BBB_Martins: 0.604
word-sense-disambiguation-on-big-benchGAL 120B (few-shot, k=5)
Accuracy: 48.7
word-sense-disambiguation-on-big-benchBLOOM 176B
Accuracy: 1.3
word-sense-disambiguation-on-big-benchGAL 30B (few-shot, k=5)
Accuracy: 47.0
word-sense-disambiguation-on-big-benchOPT 175B
Accuracy: 49.1

用 AI 构建 AI

从想法到上线——通过免费 AI 协同编程、开箱即用的环境和市场最优价格的 GPU 加速您的 AI 开发

AI 协同编程
即用型 GPU
最优价格
立即开始

Hyper Newsletters

订阅我们的最新资讯
我们会在北京时间 每周一的上午九点 向您的邮箱投递本周内的最新更新
邮件发送服务由 MailChimp 提供
Galactica:用于科学的大型语言模型 | 论文 | HyperAI超神经