
摘要
我们介绍了LLaMA,这是一系列基础语言模型,参数规模从70亿到650亿不等。我们的模型在数万亿个标记上进行了训练,并证明了仅使用公开可用的数据集即可训练出最先进的模型,而无需依赖专有且难以获取的数据集。特别是,LLaMA-13B在大多数基准测试中优于GPT-3(1750亿参数),而LLaMA-65B则与最佳模型Chinchilla-70B和PaLM-540B具有竞争力。我们已将所有模型发布给研究社区。
代码仓库
vcskaushik/LLMzip
pytorch
GitHub 中提及
icalk-nlp/educhat
pytorch
GitHub 中提及
abhaskumarsinha/Corpus2GPT
pytorch
kayvr/token-hawk
pytorch
GitHub 中提及
teelinsan/camoscio
pytorch
GitHub 中提及
krafton-ai/korani
pytorch
GitHub 中提及
akanyaani/miniLLAMA
pytorch
beomi/koalpaca
pytorch
GitHub 中提及
chaoyi-wu/finetune_llama
jax
GitHub 中提及
freedomintelligence/huatuogpt
pytorch
GitHub 中提及
phoebussi/alpaca-cot
pytorch
GitHub 中提及
yuanmu97/secure-transformer-inference
pytorch
GitHub 中提及
facebookresearch/chai
pytorch
GitHub 中提及
Mind23-2/MindCode-140
mindspore
kbressem/medalpaca
pytorch
GitHub 中提及
xusenlinzy/api-for-open-llm
pytorch
GitHub 中提及
facebookresearch/llama
官方
pytorch
GitHub 中提及
aethercortex/llama-x
pytorch
GitHub 中提及
guinmoon/llmfarm
GitHub 中提及
ganjinzero/rrhf
pytorch
GitHub 中提及
ohadrubin/rpt
jax
GitHub 中提及
squeezeailab/squeezellm
pytorch
GitHub 中提及
qwopqwop200/GPTQ-for-LLaMa
pytorch
GitHub 中提及
tatsu-lab/stanford_alpaca
pytorch
GitHub 中提及
stanfordbdhg/llama.cpp
GitHub 中提及
replicate/cog_stanford_alpaca
pytorch
GitHub 中提及
zihanzhaosjtu/librisqa
GitHub 中提及
huggingface/transformers
pytorch
GitHub 中提及
ggerganov/llama.cpp
pytorch
GitHub 中提及
ggml-org/llama.cpp
pytorch
GitHub 中提及
aozhongzhang/magr
pytorch
GitHub 中提及
fsoft-ai4code/codecapybara
pytorch
GitHub 中提及
young-geng/easylm
jax
GitHub 中提及
grantslatton/llama.cpp
GitHub 中提及
chaoyi-wu/pmc-llama
pytorch
GitHub 中提及
ecolab-postech/owq
pytorch
GitHub 中提及
meta-llama/llama
pytorch
batsresearch/alfred
pytorch
GitHub 中提及
llamafamily/llama-chinese
pytorch
GitHub 中提及
Lightning-AI/lit-llama
pytorch
ntunlplab/traditional-chinese-alpaca
pytorch
GitHub 中提及
hamishivi/easylm
jax
GitHub 中提及
flagalpha/llama2-chinese
pytorch
GitHub 中提及
MS-P3/code5/tree/main/llama
mindspore
longhao-chen/aicas2024
pytorch
GitHub 中提及
fajri91/indommlu
pytorch
GitHub 中提及
ofa-sys/expertllama
pytorch
GitHub 中提及
ecnu-icalk/educhat
pytorch
GitHub 中提及
greenbitai/low_bit_llama
pytorch
GitHub 中提及
facico/chinese-vicuna
pytorch
GitHub 中提及
xvyaward/owq
pytorch
GitHub 中提及
xiaoman-zhang/PMC-VQA
pytorch
GitHub 中提及
MS-P3/code5/tree/main/llama2
mindspore
xzhang97666/alpacare
GitHub 中提及
基准测试
| 基准 | 方法 | 指标 |
|---|---|---|
| arithmetic-reasoning-on-gsm8k | LLaMA 13B | Accuracy: 17.8 Parameters (Billion): 13 |
| arithmetic-reasoning-on-gsm8k | LLaMA 33B-maj1@k | Accuracy: 53.1 Parameters (Billion): 33 |
| arithmetic-reasoning-on-gsm8k | LLaMA 7B | Accuracy: 11.0 Parameters (Billion): 7 |
| arithmetic-reasoning-on-gsm8k | LLaMA 33B | Accuracy: 35.6 Parameters (Billion): 33 |
| arithmetic-reasoning-on-gsm8k | LLaMA 7B (maj1@k) | Accuracy: 18.1 Parameters (Billion): 7 |
| arithmetic-reasoning-on-gsm8k | LLaMA 65B | Accuracy: 50.9 Parameters (Billion): 65 |
| arithmetic-reasoning-on-gsm8k | LLaMA 13B-maj1@k | Accuracy: 29.3 Parameters (Billion): 13 |
| arithmetic-reasoning-on-gsm8k | LLaMA 65B-maj1@k | Accuracy: 69.7 Parameters (Billion): 65 |
| code-generation-on-mbpp | LLaMA 33B (0-shot) | Accuracy: 30.2 |
| code-generation-on-mbpp | LLaMA 13B (0-shot) | Accuracy: 22 |
| code-generation-on-mbpp | LLaMA 65B (0-shot) | Accuracy: 37.7 |
| code-generation-on-mbpp | LLaMA 7B (0-shot) | Accuracy: 17.7 |
| common-sense-reasoning-on-arc-challenge | LLaMA 65B (zero-shot) | Accuracy: 56.0 |
| common-sense-reasoning-on-arc-challenge | LLaMA 7B (zero-shot) | Accuracy: 47.6 |
| common-sense-reasoning-on-arc-challenge | LLaMA 13B (zero-shot) | Accuracy: 52.7 |
| common-sense-reasoning-on-arc-challenge | LLaMA 33B (zero-shot) | Accuracy: 57.8 |
| common-sense-reasoning-on-arc-easy | LLaMA 13B (0-shot) | Accuracy: 74.8 |
| common-sense-reasoning-on-arc-easy | LLaMA 7B (0-shot) | Accuracy: 72.8 |
| common-sense-reasoning-on-arc-easy | LLaMA 33B (0-shot) | Accuracy: 80.0 |
| common-sense-reasoning-on-arc-easy | LLaMA 65B (0-shot) | Accuracy: 78.9 |
| common-sense-reasoning-on-winogrande | LLaMA 13B (0-shot) | Accuracy: 73.0 |
| common-sense-reasoning-on-winogrande | LLaMA 33B (0-shot) | Accuracy: 76.0 |
| common-sense-reasoning-on-winogrande | LLaMA 7B (0-shot) | Accuracy: 70.1 |
| common-sense-reasoning-on-winogrande | LLaMA 65B (0-shot) | Accuracy: 77.0 |
| few-shot-learning-on-medconceptsqa | meta-llama/Meta-Llama-3-8B-Instruct | Accuracy: 25.653 |
| math-word-problem-solving-on-math | LLaMA 13B | Accuracy: 3.9 Parameters (Billions): 13 |
| math-word-problem-solving-on-math | LLaMA 13B-maj1@k | Accuracy: 8.8 Parameters (Billions): 13 |
| math-word-problem-solving-on-math | LLaMA 7B | Accuracy: 2.9 Parameters (Billions): 7 |
| math-word-problem-solving-on-math | LLaMA 7B-maj1@k | Accuracy: 6.9 Parameters (Billions): 7 |
| math-word-problem-solving-on-math | LLaMA 65B | Accuracy: 10.6 Parameters (Billions): 65 |
| math-word-problem-solving-on-math | LLaMA 33B | Accuracy: 7.1 Parameters (Billions): 33 |
| math-word-problem-solving-on-math | LLaMA 65B (maj1@k) | Accuracy: 20.5 Parameters (Billions): 65 |
| math-word-problem-solving-on-math | LLaMA 33B-maj1@k | Accuracy: 15.2 Parameters (Billions): 33 |
| multi-task-language-understanding-on-mmlu | LLaMA 65B (fine-tuned) | Average (%): 68.9 |
| multi-task-language-understanding-on-mmlu | LLaMA 65B (5-shot) | Average (%): 63.4 |
| multi-task-language-understanding-on-mmlu | LLaMA 33B (5-shot) | Average (%): 57.8 |
| question-answering-on-boolq | LLaMA 7B (zero-shot) | Accuracy: 76.5 |
| question-answering-on-boolq | LLaMA 65B (0-shot) | Accuracy: 85.3 |
| question-answering-on-boolq | LLaMA 33B (0-shot) | Accuracy: 83.1 |
| question-answering-on-boolq | LLaMA 13B (zero-shot) | Accuracy: 78.1 |
| question-answering-on-natural-questions | LLaMA 65B (few-shot, k=5) | EM: 35.0 |
| question-answering-on-natural-questions | LLaMA 65B (few-shot, k=64) | EM: 39.9 |
| question-answering-on-natural-questions | LLaMA 33B (zero-shot) | EM: 24.9 |
| question-answering-on-natural-questions | LLaMA 65B (one-shot) | EM: 31.0 |
| question-answering-on-obqa | LLaMA 7B (zero-shot) | Accuracy: 57.2 |
| question-answering-on-obqa | LLaMA 13B (zero-shot) | Accuracy: 56.4 |
| question-answering-on-obqa | LLaMA 65B (zero-shot) | Accuracy: 60.2 |
| question-answering-on-obqa | LLaMA 33B (zero-shot) | Accuracy: 58.6 |
| question-answering-on-piqa | LLaMA 33B (0-shot) | Accuracy: 82.3 |
| question-answering-on-piqa | LLaMA 7B (0-shot) | Accuracy: 79.8 |
| question-answering-on-piqa | LLaMA 13B (0-shot) | Accuracy: 80.1 |
| question-answering-on-piqa | LLaMA 65B (0-shot) | Accuracy: 82.8 |
| question-answering-on-social-iqa | LLaMA 13B (zero-shot) | Accuracy: 50.4 |
| question-answering-on-social-iqa | LLaMA 7B (zero-shot) | Accuracy: 48.9 |
| question-answering-on-social-iqa | LLaMA 65B (zero-shot) | Accuracy: 52.3 |
| question-answering-on-social-iqa | LLaMA 33B (zero-shot) | Accuracy: 50.4 |
| question-answering-on-timequestions | Llama3 | P@1: 17.8 |
| question-answering-on-triviaqa | LLaMA 65B (few-shot, k=64) | EM: 73.0 |
| question-answering-on-triviaqa | LLaMA 65B (one-shot) | EM: 71.6 |
| question-answering-on-triviaqa | LLaMA 65B (few-shot, k=5) | EM: 72.6 |
| question-answering-on-triviaqa | LLaMA 65B (zero-shot) | EM: 68.2 |
| question-answering-on-truthfulqa | LLaMA 65B | % info: 53 % true: 57 |
| question-answering-on-truthfulqa | LLaMA 7B | % info: 29 % true: 33 |
| question-answering-on-truthfulqa | LLaMA 13B | % info: 41 % true: 47 |
| question-answering-on-truthfulqa | LLaMA 33B | % info: 48 % true: 52 |
| reading-comprehension-on-race | LLaMA 33B (zero-shot) | Accuracy (High): 48.3 Accuracy (Middle): 64.1 |
| reading-comprehension-on-race | LLaMA 65B (zero-shot) | Accuracy (High): 51.6 Accuracy (Middle): 67.9 |
| reading-comprehension-on-race | LLaMA 7B (zero-shot) | Accuracy (High): 46.9 Accuracy (Middle): 61.1 |
| reading-comprehension-on-race | LLaMA 13B (zero-shot) | Accuracy (High): 47.2 Accuracy (Middle): 61.6 |
| stereotypical-bias-analysis-on-crows-pairs | LLaMA 65B | Age: 70.1 Disability: 66.7 Gender: 70.6 Nationality: 64.2 Overall: 66.6 Physical Appearance: 77.8 Race/Color: 57.0 Religion: 70.6 Sexual Orientation: 81.0 Socioeconomic status: 71.5 |
| zero-shot-learning-on-medconceptsqa | meta-llama/Meta-Llama-3-8B-Instruct | Accuracy: 25.840 |