4 个月前

Llama 2:开放的基础模型和微调的聊天模型

Llama 2:开放的基础模型和微调的聊天模型

摘要

在本研究中,我们开发并发布了Llama 2,这是一系列预训练和微调的大规模语言模型(LLMs),参数规模从70亿到700亿不等。我们的微调模型称为Llama 2-Chat,专门针对对话应用场景进行了优化。在我们测试的大多数基准上,这些模型的表现优于开源聊天模型,并且根据我们在有用性和安全性方面的人类评估结果,它们可能成为闭源模型的合适替代品。我们详细描述了对Llama 2-Chat进行微调和安全改进的方法,以帮助社区在此基础上进一步发展,并促进大规模语言模型(LLMs)负责任的研发。

代码仓库

xverse-ai/xverse-13b
pytorch
GitHub 中提及
coastalcph/eu-politics-llms
pytorch
GitHub 中提及
IBM/Dromedary
pytorch
GitHub 中提及
squeezeailab/squeezellm
pytorch
GitHub 中提及
zurichnlp/contradecode
pytorch
GitHub 中提及
xuetianci/pacit
pytorch
GitHub 中提及
young-geng/easylm
jax
GitHub 中提及
llamafamily/llama-chinese
pytorch
GitHub 中提及
glb400/Toy-RecLM
pytorch
GitHub 中提及
rijgersberg/geitje
pytorch
GitHub 中提及
flagalpha/llama2-chinese
pytorch
GitHub 中提及
usyd-fsalab/fp6_llm
pytorch
GitHub 中提及
idiap/abroad-re
pytorch
GitHub 中提及
ninglab/ecellm
pytorch
GitHub 中提及
Lightning-AI/lit-gpt
pytorch
GitHub 中提及
xzhang97666/alpacare
GitHub 中提及

基准测试

基准方法指标
arithmetic-reasoning-on-gsm8kLLaMA 2 70B (on-shot)
Accuracy: 56.8
Parameters (Billion): 70
code-generation-on-mbppLlama 2 34B (0-shot)
Accuracy: 33
code-generation-on-mbppLlama 2 7B (0-shot)
Accuracy: 20.8
code-generation-on-mbppLlama 2 70B (zero-shot)
Accuracy: 45
code-generation-on-mbppLlama 2 13B (0-shot)
Accuracy: 30.6
math-word-problem-solving-on-mawpsLLaMA 2-Chat
Accuracy (%): 82.4
math-word-problem-solving-on-svampLLaMA 2-Chat
Execution Accuracy: 69.2
multi-task-language-understanding-on-mmluLLaMA 2 13B (5-shot)
Average (%): 54.8
multi-task-language-understanding-on-mmluLLaMA 2 34B (5-shot)
Average (%): 62.6
multi-task-language-understanding-on-mmluLLaMA 2 7B (5-shot)
Average (%): 45.3
multiple-choice-question-answering-mcqa-on-25Llama2-7B
Accuracy: 43.38
multiple-choice-question-answering-mcqa-on-25Llama2-7B-chat
Accuracy: 40.07
question-answering-on-boolqLLaMA 2 13B (0-shot)
Accuracy: 81.7
question-answering-on-boolqLLaMA 2 34B (0-shot)
Accuracy: 83.7
question-answering-on-boolqLLaMA 2 7B (zero-shot)
Accuracy: 77.4
question-answering-on-boolqLLaMA 2 70B (0-shot)
Accuracy: 85
question-answering-on-multitqLLaMA2
Hits@1: 18.5
question-answering-on-natural-questionsLLaMA 2 70B (one-shot)
EM: 33.0
question-answering-on-piqaLLaMA 2 13B (0-shot)
Accuracy: 80.5
question-answering-on-piqaLLaMA 2 34B (0-shot)
Accuracy: 81.9
question-answering-on-piqaLLaMA 2 7B (0-shot)
Accuracy: 78.8
question-answering-on-piqaLLaMA 2 70B (0-shot)
Accuracy: 82.8
question-answering-on-pubchemqaLlama2-7B-chat
BLEU-2: 0.075
BLEU-4: 0.009
MEATOR: 0.149
ROUGE-1: 0.184
ROUGE-2: 0.043
ROUGE-L: 0.142
question-answering-on-triviaqaLLaMA 2 70B (one-shot)
EM: 85
question-answering-on-uniprotqaLlama2-7B-chat
BLEU-2: 0.019
BLEU-4: 0.002
MEATOR: 0.052
ROUGE-1: 0.103
ROUGE-2: 0.060
ROUGE-L: 0.009

用 AI 构建 AI

从想法到上线——通过免费 AI 协同编程、开箱即用的环境和市场最优价格的 GPU 加速您的 AI 开发

AI 协同编程
即用型 GPU
最优价格
立即开始

Hyper Newsletters

订阅我们的最新资讯
我们会在北京时间 每周一的上午九点 向您的邮箱投递本周内的最新更新
邮件发送服务由 MailChimp 提供
Llama 2:开放的基础模型和微调的聊天模型 | 论文 | HyperAI超神经