HyperAIHyperAI

Command Palette

Search for a command to run...

5 months ago

Galactica: A Large Language Model for Science

Ross Taylor; Marcin Kardas; Guillem Cucurull; Thomas Scialom; Anthony Hartshorn; Elvis Saravia; Andrew Poulton; Viktor Kerkez; Robert Stojnic

Galactica: A Large Language Model for Science

Abstract

Information overload is a major obstacle to scientific progress. The explosive growth in scientific literature and data has made it ever harder to discover useful insights in a large mass of information. Today scientific knowledge is accessed through search engines, but they are unable to organize scientific knowledge alone. In this paper we introduce Galactica: a large language model that can store, combine and reason about scientific knowledge. We train on a large scientific corpus of papers, reference material, knowledge bases and many other sources. We outperform existing models on a range of scientific tasks. On technical knowledge probes such as LaTeX equations, Galactica outperforms the latest GPT-3 by 68.2% versus 49.0%. Galactica also performs well on reasoning, outperforming Chinchilla on mathematical MMLU by 41.3% to 35.7%, and PaLM 540B on MATH with a score of 20.4% versus 8.8%. It also sets a new state-of-the-art on downstream tasks such as PubMedQA and MedMCQA dev of 77.6% and 52.9%. And despite not being trained on a general corpus, Galactica outperforms BLOOM and OPT-175B on BIG-bench. We believe these results demonstrate the potential for language models as a new interface for science. We open source the model for the benefit of the scientific community.

Code Repositories

paperswithcode/galai
Official
pytorch

Benchmarks

BenchmarkMethodologyMetrics
bias-detection-on-stereoset-1OPT 175B
ICAT Score: 60
LMS: 74.8
SS: 59.9
bias-detection-on-stereoset-1GAL 120B
ICAT Score: 65.6
LMS: 75
SS: 56.2
bias-detection-on-stereoset-1GPT-3 (text-davinci-002)
ICAT Score: 60.8
LMS: 77.6
SS: 60.8
common-sense-reasoning-on-arc-challengeBLOOM (few-shot, k=5)
Accuracy: 32.9
common-sense-reasoning-on-arc-challengeGAL 120B (zero-shot)
Accuracy: 67.9
common-sense-reasoning-on-arc-challengeOPT (few-shot, k=5)
Accuracy: 31.1
common-sense-reasoning-on-arc-challengeGPT-3 (zero-shot)
Accuracy: 51.4
common-sense-reasoning-on-arc-easyGAL 120B (0-shot)
Accuracy: 83.8
common-sense-reasoning-on-arc-easyBLOOM (5-shot)
Accuracy: 40.7
common-sense-reasoning-on-arc-easyGPT-3 (zero-shot)
Accuracy: 68.8
common-sense-reasoning-on-arc-easyOPT (5-shot)
Accuracy: 37.4
math-word-problem-solving-on-mathGAL 120B <work>
Accuracy: 16.6
Parameters (Billions): 120
math-word-problem-solving-on-mathGAL 120B (5-shot) mCoT
Accuracy: 20.4
Parameters (Billions): 120
math-word-problem-solving-on-mathMinerva 540B (5-shot) mCoT
Accuracy: 33.6
Parameters (Billions): 540
math-word-problem-solving-on-mathGAL 30B <work>
Accuracy: 11.4
Parameters (Billions): 30
math-word-problem-solving-on-mathPaLM 540B (5-shot) mCoT
Accuracy: 8.8
Parameters (Billions): 540
math-word-problem-solving-on-mathGPT-3 175B (8-shot)
Accuracy: 5.2
Parameters (Billions): 175
math-word-problem-solving-on-mathGAL 30B (5-shot) mCoT
Accuracy: 12.7
Parameters (Billions): 30
mathematical-reasoning-on-mmlu-mathematicsGAL 120B <work>
Accuracy: 41.3
molecular-property-prediction-on-bace-1GAL 1.3B
ROC-AUC: 57.6
molecular-property-prediction-on-bace-1GAL 30B
ROC-AUC: 72.7
molecular-property-prediction-on-bace-1GAL 125M
ROC-AUC: 56.1
molecular-property-prediction-on-bace-1GAL 120B
ROC-AUC: 61.7
molecular-property-prediction-on-bace-1GAL 6.7B
ROC-AUC: 58.4
molecular-property-prediction-on-bbbp-1GAL 6.7B
ROC-AUC: 53.5
molecular-property-prediction-on-bbbp-1GAL 125M
ROC-AUC: 39.3
molecular-property-prediction-on-bbbp-1GAL 120B
ROC-AUC: 66.1
molecular-property-prediction-on-bbbp-1Uni-Mol
ROC-AUC: 72.9
molecular-property-prediction-on-bbbp-1GAL 30B
ROC-AUC: 59.6
molecular-property-prediction-on-bbbp-1GAL 1.3B
ROC-AUC: 60.4
molecular-property-prediction-on-clintox-1GAL 1.3B
Molecules (M): 2
ROC-AUC: 58.9
molecular-property-prediction-on-clintox-1GAL 125M
Molecules (M): 2
ROC-AUC: 51.8
molecular-property-prediction-on-clintox-1GAL 120B
Molecules (M): 2
ROC-AUC: 82.6
molecular-property-prediction-on-clintox-1GAL 6.7B
Molecules (M): 2
ROC-AUC: 78.4
molecular-property-prediction-on-clintox-1GAL 30B
Molecules (M): 2
ROC-AUC: 82.2
molecular-property-prediction-on-hiv-datasetGAL 30B
AUC: 0.759
molecular-property-prediction-on-hiv-datasetGAL 1.3B
AUC: 0.724
molecular-property-prediction-on-hiv-datasetGAL 125M
AUC: 0.702
molecular-property-prediction-on-hiv-datasetGAL 6.7B
AUC: 0.722
molecular-property-prediction-on-hiv-datasetUni-Mol
AUC: 0.808
molecular-property-prediction-on-hiv-datasetGAL 120B
AUC: 0.745
molecular-property-prediction-on-moleculenetGAL 30B
AUC: 0.69
molecular-property-prediction-on-moleculenetGAL 125M
AUC: 0.581
molecular-property-prediction-on-moleculenetGAL 1.3B
AUC: 0.619
molecular-property-prediction-on-moleculenetGAL 6.7B
AUC: 0.64
molecular-property-prediction-on-moleculenetUni-Mol
AUC: 0.77
molecular-property-prediction-on-sider-1GAL 125M
ROC-AUC: 55.9
molecular-property-prediction-on-sider-1GAL 1.3B
ROC-AUC: 54.0
molecular-property-prediction-on-sider-1GAL 6.7B
ROC-AUC: 55.9
molecular-property-prediction-on-sider-1GAL 120B
ROC-AUC: 63.2
molecular-property-prediction-on-sider-1GAL 30B
ROC-AUC: 61.3
molecular-property-prediction-on-tox21-1GAL 125M
ROC-AUC: 54.3
molecular-property-prediction-on-tox21-1GAL 120B
ROC-AUC: 68.9
molecular-property-prediction-on-tox21-1Uni-Mol
ROC-AUC: 79.6
molecular-property-prediction-on-tox21-1GAL 6.7B
ROC-AUC: 63.9
molecular-property-prediction-on-tox21-1GAL 30B
ROC-AUC: 68.5
molecular-property-prediction-on-tox21-1GAL 1.3B
ROC-AUC: 60.6
multi-task-language-understanding-on-mmluGAL 120B (zero-shot)
Average (%): 52.6
multiple-choice-question-answering-mcqa-on-10BLOOM (few-shot, k=5)
Accuracy: 27.6
multiple-choice-question-answering-mcqa-on-10Gopher (few-shot, k=5)
Accuracy: 33.6
multiple-choice-question-answering-mcqa-on-10Chinchilla (few-shot, k=5)
Accuracy: 41.5
multiple-choice-question-answering-mcqa-on-10OPT (few-shot, k=5)
Accuracy: 25.7
multiple-choice-question-answering-mcqa-on-10GAL 120B (zero-shot)
Accuracy: 38.1
multiple-choice-question-answering-mcqa-on-11OPT (few-shot, k=5)
Accuracy: 30.6
multiple-choice-question-answering-mcqa-on-11GAL 120B (zero-shot)
Accuracy: 68.8
multiple-choice-question-answering-mcqa-on-11BLOOM (few-shot, k=5)
Accuracy: 28.5
multiple-choice-question-answering-mcqa-on-11Gopher (few-shot, k=5)
Accuracy: 70.8
multiple-choice-question-answering-mcqa-on-11Chinchilla (few-shot, k=5)
Accuracy: 79.9
multiple-choice-question-answering-mcqa-on-12OPT (few-shot, k=5)
Accuracy: 27.7
multiple-choice-question-answering-mcqa-on-12GAL 120B (zero-shot)
Accuracy: 69.4
multiple-choice-question-answering-mcqa-on-12Chinchilla (few-shot, k=5)
Accuracy: 80.3
multiple-choice-question-answering-mcqa-on-12BLOOM (few-shot, k=5)
Accuracy: 29.4
multiple-choice-question-answering-mcqa-on-12Gopher (few-shot, k=5)
Accuracy: 71.3
multiple-choice-question-answering-mcqa-on-13Chinchilla (few-shot, k=5)
Accuracy: 51
multiple-choice-question-answering-mcqa-on-13BLOOM (few-shot, k=5)
Accuracy: 19
multiple-choice-question-answering-mcqa-on-13GAL 120B (zero-shot)
Accuracy: 46
multiple-choice-question-answering-mcqa-on-13OPT (few-shot, k=5)
Accuracy: 30
multiple-choice-question-answering-mcqa-on-13Gopher (few-shot, k=5)
Accuracy: 45
multiple-choice-question-answering-mcqa-on-14BLOOM (few-shot, k=5)
Accuracy: 23.2
multiple-choice-question-answering-mcqa-on-14OPT (few-shot, k=5)
Accuracy: 21.7
multiple-choice-question-answering-mcqa-on-14GAL 120B (zero-shot)
Accuracy: 47.8
multiple-choice-question-answering-mcqa-on-14Chinchilla (few-shot, k=5)
Accuracy: 58.1
multiple-choice-question-answering-mcqa-on-15Chinchilla (few-shot, k=5)
Accuracy: 51.0
multiple-choice-question-answering-mcqa-on-15GAL 120B (zero-shot)
Accuracy: 49
multiple-choice-question-answering-mcqa-on-15BLOOM (few-shot, k=5)
Accuracy: 6.0
multiple-choice-question-answering-mcqa-on-15OPT (few-shot, k=5)
Accuracy: 17.0
multiple-choice-question-answering-mcqa-on-16Gopher (few-shot, k=5)
Accuracy: 23.7
multiple-choice-question-answering-mcqa-on-16Chinchilla (few-shot, k=5)
Accuracy: 31.9
multiple-choice-question-answering-mcqa-on-16BLOOM (few-shot, k=5)
Accuracy: 27
multiple-choice-question-answering-mcqa-on-16GAL 120B (zero-shot)
Accuracy: 32.6
multiple-choice-question-answering-mcqa-on-16OPT (few-shot, k=5)
Accuracy: 24.4
multiple-choice-question-answering-mcqa-on-17GAL 120B (zero-shot)
Accuracy: 62.8
multiple-choice-question-answering-mcqa-on-17BLOOM (few-shot, k=5)
Accuracy: 32.4
multiple-choice-question-answering-mcqa-on-17Gopher (few-shot, k=5)
Accuracy: 60
multiple-choice-question-answering-mcqa-on-17Chinchilla (few-shot, k=5)
Accuracy: 62.1
multiple-choice-question-answering-mcqa-on-17OPT (few-shot, k=5)
Accuracy: 36.6
multiple-choice-question-answering-mcqa-on-18GAL 120B (zero-shot)
Accuracy: 42.2
multiple-choice-question-answering-mcqa-on-18OPT (few-shot, k=5)
Accuracy: 21.6
multiple-choice-question-answering-mcqa-on-18Gopher (few-shot, k=5)
Accuracy: 34.3
multiple-choice-question-answering-mcqa-on-18BLOOM (few-shot, k=5)
Accuracy: 18.6
multiple-choice-question-answering-mcqa-on-18Chinchilla (few-shot, k=5)
Accuracy: 46.1
multiple-choice-question-answering-mcqa-on-19OPT (few-shot, k=5)
Accuracy: 29.8
multiple-choice-question-answering-mcqa-on-19GAL 120B (zero-shot)
Accuracy: 33.8
multiple-choice-question-answering-mcqa-on-19BLOOM (few-shot, k=5)
Accuracy: 25.2
multiple-choice-question-answering-mcqa-on-19Chinchilla (few-shot, k=5)
Accuracy: 36.4
multiple-choice-question-answering-mcqa-on-2Gopher (few-shot, k=5)
Accuracy: 35.7
multiple-choice-question-answering-mcqa-on-2GAL 120B (zero-shot)
Accuracy: 32.5
multiple-choice-question-answering-mcqa-on-2BLOOM (few-shot, k=5)
Accuracy: 26.2
multiple-choice-question-answering-mcqa-on-2OPT (few-shot, k=5)
Accuracy: 29.4
multiple-choice-question-answering-mcqa-on-2Chinchilla (few-shot, k=5)
Accuracy: 33.3
multiple-choice-question-answering-mcqa-on-20Gopher (few-shot, k=5)
Accuracy: 50
multiple-choice-question-answering-mcqa-on-20OPT (few-shot, k=5)
Accuracy: 43.5
multiple-choice-question-answering-mcqa-on-20Chinchilla (few-shot, k=5)
Accuracy: 58.8
multiple-choice-question-answering-mcqa-on-20GAL 120B (zero-shot)
Accuracy: 41.2
multiple-choice-question-answering-mcqa-on-20BLOOM (few-shot, k=5)
Accuracy: 19.4
multiple-choice-question-answering-mcqa-on-21OPT (few-shot, k=5)
Dev Set (Acc-%): 0.296
multiple-choice-question-answering-mcqa-on-21GAL 120B (zero-shot)
Dev Set (Acc-%): 0.529
multiple-choice-question-answering-mcqa-on-21BLOOM (few-shot, k=5)
Dev Set (Acc-%): 0.325
multiple-choice-question-answering-mcqa-on-3Gopher (few-shot, k=5)
Accuracy: 25
multiple-choice-question-answering-mcqa-on-3Chinchilla (few-shot, k=5)
Accuracy: 31
multiple-choice-question-answering-mcqa-on-3OPT (few-shot, k=5)
Accuracy: 21
multiple-choice-question-answering-mcqa-on-3GAL 120B (zero-shot)
Accuracy: 27
multiple-choice-question-answering-mcqa-on-3GAL 30B (zero-shot)
Accuracy: 33.3
multiple-choice-question-answering-mcqa-on-4GAL 120B (zero-shot)
Accuracy: 42.1
multiple-choice-question-answering-mcqa-on-4OPT (few-shot, k=5)
Accuracy: 21
multiple-choice-question-answering-mcqa-on-4BLOOM (few-shot, k=5)
Accuracy: 23.7
multiple-choice-question-answering-mcqa-on-4Chinchilla (few-shot, k=5)
Accuracy: 38.6
multiple-choice-question-answering-mcqa-on-4Gopher (few-shot, k=5)
Accuracy: 43
multiple-choice-question-answering-mcqa-on-5OPT (few-shot, k=5)
Accuracy: 30
multiple-choice-question-answering-mcqa-on-5Chinchilla (few-shot, k=5)
Accuracy: 58
multiple-choice-question-answering-mcqa-on-5Gopher (few-shot, k=5)
Accuracy: 54
multiple-choice-question-answering-mcqa-on-5GAL 120B (zero-shot)
Accuracy: 70
multiple-choice-question-answering-mcqa-on-5BLOOM (few-shot, k=5)
Accuracy: 25
multiple-choice-question-answering-mcqa-on-6Chinchilla (few-shot, k=5)
Accuracy: 41.1
multiple-choice-question-answering-mcqa-on-6GAL 120B (zero-shot)
Accuracy: 38.4
multiple-choice-question-answering-mcqa-on-6BLOOM (few-shot, k=5)
Accuracy: 25
multiple-choice-question-answering-mcqa-on-6OPT (few-shot, k=5)
Accuracy: 28.6
multiple-choice-question-answering-mcqa-on-7BLOOM (few-shot, k=5)
Accuracy: 25
multiple-choice-question-answering-mcqa-on-7Gopher (few-shot, k=5)
Accuracy: 37
multiple-choice-question-answering-mcqa-on-7Chinchilla (few-shot, k=5)
Accuracy: 32
multiple-choice-question-answering-mcqa-on-7GAL 120B (zero-shot)
Accuracy: 43
multiple-choice-question-answering-mcqa-on-7OPT (few-shot, k=5)
Accuracy: 33
multiple-choice-question-answering-mcqa-on-8BLOOM (few-shot, k=5)
Accuracy: 36
multiple-choice-question-answering-mcqa-on-8Chinchilla (few-shot, k=5)
Accuracy: 69
multiple-choice-question-answering-mcqa-on-8GAL 30B (zero-shot)
Accuracy: 70
multiple-choice-question-answering-mcqa-on-8GAL 120B (zero-shot)
Accuracy: 68
multiple-choice-question-answering-mcqa-on-8OPT (few-shot, k=5)
Accuracy: 35
multiple-choice-question-answering-mcqa-on-9BLOOM (few-shot, k=5)
Accuracy: 25.7
multiple-choice-question-answering-mcqa-on-9GAL 120B (zero-shot)
Accuracy: 65.1
multiple-choice-question-answering-mcqa-on-9Gopher (few-shot, k=5)
Accuracy: 65.8
multiple-choice-question-answering-mcqa-on-9OPT (few-shot, k=5)
Accuracy: 23.0
multiple-choice-question-answering-mcqa-on-9Chinchilla (few-shot, k=5)
Accuracy: 73.0
protein-function-prediction-on-caspsimseqGAL 1.3B
ROUGE-L: 0.069
protein-function-prediction-on-caspsimseqGAL 30B
ROUGE-L: 0.137
protein-function-prediction-on-caspsimseqGAL 120B
ROUGE-L: 0.252
protein-function-prediction-on-caspsimseqGAL 6.7B
ROUGE-L: 0.109
protein-function-prediction-on-caspsimseqGAL 125M
ROUGE-L: 0.062
protein-function-prediction-on-paenseqGAL 30B
ROUGE-L: 0.196
protein-function-prediction-on-paenseqGAL 120B
ROUGE-L: 0.272
protein-function-prediction-on-paenseqGAL 1.3B
ROUGE-L: 0.084
protein-function-prediction-on-paenseqGAL 125M
ROUGE-L: 0.073
protein-function-prediction-on-paenseqGAL 6.7B
ROUGE-L: 0.137
protein-function-prediction-on-uniprotseqGAL 30B
ROUGE-L: 0.186
protein-function-prediction-on-uniprotseqGAL 125M
ROUGE-L: 0.061
protein-function-prediction-on-uniprotseqGAL 120B
ROUGE-L: 0.252
protein-function-prediction-on-uniprotseqGAL 6.7B
ROUGE-L: 0.111
protein-function-prediction-on-uniprotseqGAL 1.3B
ROUGE-L: 0.079
protein-structure-prediction-on-caspseqGAL 6.7B
Validation perplexity: 17.29
protein-structure-prediction-on-caspseqGAL 1.3B
Validation perplexity: 17.58
protein-structure-prediction-on-caspseqGAL 30B
Validation perplexity: 17.27
protein-structure-prediction-on-caspseqGAL 125M
Validation perplexity: 20.62
protein-structure-prediction-on-caspseqGAL 120B
Validation perplexity: 17.26
protein-structure-prediction-on-caspsimseqGAL 1.3B
Validation perplexity: 17.04
protein-structure-prediction-on-caspsimseqGAL 30B
Validation perplexity: 15.42
protein-structure-prediction-on-caspsimseqGAL 125M
Validation perplexity: 19.18
protein-structure-prediction-on-caspsimseqGAL 6.7B
Validation perplexity: 16.35
protein-structure-prediction-on-caspsimseqGAL 120B
Validation perplexity: 12.77
protein-structure-prediction-on-paenseqGAL 30B
Validation perplexity: 4.28
protein-structure-prediction-on-paenseqGAL 6.7B
Validation perplexity: 7.76
protein-structure-prediction-on-paenseqGAL 120B
Validation perplexity: 3.14
protein-structure-prediction-on-paenseqGAL 1.3B
Validation perplexity: 12.53
protein-structure-prediction-on-paenseqGAL 125M
Validation perplexity: 16.35
protein-structure-prediction-on-uniprotseqGAL 6.7B
Validation perplexity: 11.58
protein-structure-prediction-on-uniprotseqGAL 125M
Validation perplexity: 19.05
protein-structure-prediction-on-uniprotseqGAL 1.3B
Validation perplexity: 15.82
protein-structure-prediction-on-uniprotseqGAL 120B
Validation perplexity: 5.54
protein-structure-prediction-on-uniprotseqGAL 30B
Validation perplexity: 8.23
question-answering-on-bioasqGAL 120B (zero-shot)
Accuracy: 94.3
question-answering-on-bioasqBLOOM (zero-shot)
Accuracy: 91.4
question-answering-on-bioasqOPT (zero-shot)
Accuracy: 81.4
question-answering-on-medqa-usmleGAL 120B (zero-shot)
Accuracy: 44.4
question-answering-on-medqa-usmleOPT (few-shot, k=5)
Accuracy: 22.8
question-answering-on-medqa-usmleBLOOM (few-shot, k=5)
Accuracy: 23.3
question-answering-on-pubmedqaGAL 120B (zero-shot)
Accuracy: 77.6
question-answering-on-pubmedqaBLOOM (zero-shot)
Accuracy: 73.6
question-answering-on-pubmedqaOPT (zero-shot)
Accuracy: 70.2
question-answering-on-truthfulqaGAL 6.7B
MC1: 0.19
question-answering-on-truthfulqaGAL 30B
MC1: 0.24
question-answering-on-truthfulqaGAL 1.3B
MC1: 0.19
question-answering-on-truthfulqaGAL 120B
MC1: 0.26
question-answering-on-truthfulqaGAL 125M
MC1: 0.19
question-answering-on-truthfulqaOPT 175B
MC1: 0.21
stereotypical-bias-analysis-on-crows-pairsGAL 120B
Age: 69
Disability: 66.7
Gender: 51.9
Nationality: 51.6
Overall: 60.5
Physical Appearance: 58.7
Race/Color: 59.9
Religion: 51.9
Sexual Orientation: 77.4
Socioeconomic status: 65.7
tdc-admet-benchmarking-group-on-tdcommonsGalactica-GAL-120B
TDC.BBB_Martins: 0.661
tdc-admet-benchmarking-group-on-tdcommonsGalactica-GAL-125M
TDC.BBB_Martins: 0.393
tdc-admet-benchmarking-group-on-tdcommonsGalactica-GAL-30B
TDC.BBB_Martins: 0.596
tdc-admet-benchmarking-group-on-tdcommonsGalactica-GAL-6.7B
TDC.BBB_Martins: 0.535
tdc-admet-benchmarking-group-on-tdcommonsGalactica-GAL-1.3B
TDC.BBB_Martins: 0.604
word-sense-disambiguation-on-big-benchGAL 120B (few-shot, k=5)
Accuracy: 48.7
word-sense-disambiguation-on-big-benchBLOOM 176B
Accuracy: 1.3
word-sense-disambiguation-on-big-benchGAL 30B (few-shot, k=5)
Accuracy: 47.0
word-sense-disambiguation-on-big-benchOPT 175B
Accuracy: 49.1

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Galactica: A Large Language Model for Science | Papers | HyperAI