
摘要
我们介绍了Mistral 7B v0.1,这是一款具有70亿参数的语言模型,旨在实现卓越的性能和效率。Mistral 7B在所有评估基准上均优于Llama 2 13B,并且在推理、数学和代码生成方面超越了Llama 1 34B。我们的模型采用了分组查询注意力(Grouped-Query Attention, GQA)以加快推理速度,并结合滑动窗口注意力(Sliding Window Attention, SWA)有效处理任意长度的序列,同时降低了推理成本。此外,我们还提供了一款经过微调以遵循指令的模型——Mistral 7B -- Instruct,该模型在人类和自动化基准测试中均超过了Llama 2 13B -- Chat模型。我们的模型均在Apache 2.0许可证下发布。
代码仓库
mgmalek/efficient_cross_entropy
pytorch
GitHub 中提及
mistralai/mistral-src
官方
pytorch
pwc-1/Paper-9/tree/main/2/mistral
mindspore
facebookresearch/fairseq2
pytorch
GitHub 中提及
knowlab/bi-weekly-paper-presentation
GitHub 中提及
ninglab/ecellm
pytorch
GitHub 中提及
基准测试
| 基准 | 方法 | 指标 |
|---|---|---|
| answerability-prediction-on-peerqa | Mistral-IT-v02-7B-32k | Macro F1: 0.4703 |
| arithmetic-reasoning-on-gsm8k | Mistral 7B (maj@8) | Accuracy: 52.2 Parameters (Billion): 7 |
| code-generation-on-mbpp | Mistral 7B (3-shot) | Accuracy: 47.5 |
| common-sense-reasoning-on-arc-challenge | Mistral 7B (0-shot) | Accuracy: 55.5 |
| common-sense-reasoning-on-arc-easy | Mistral 7B (0-shot) | Accuracy: 80.0 |
| common-sense-reasoning-on-winogrande | Mistral 7B (0-shot) | Accuracy: 75.3 |
| math-word-problem-solving-on-math | Mistral 7B (maj@4) | Accuracy: 13.1 Parameters (Billions): 7 |
| multi-task-language-understanding-on-mmlu | Mistral 7B (5-shot) | Average (%): 60.1 |
| question-answering-on-natural-questions | Mistral 7B (5-shot) | EM: 28.8 |
| question-answering-on-peerqa | Mistral-v02-7B-32k | AlignScore: 0.0827 Prometheus-2 Answer Correctness: 3.4245 Rouge-L: 0.1922 |
| question-answering-on-piqa | Mistral 7B (0-shot) | Accuracy: 83.0 |
| question-answering-on-triviaqa | Mistral 7B (5-shot) | EM: 69.9 |
| zero-shot-video-question-answer-on-intentqa | Mistral (7B) | Accuracy: 50.4 |
| zero-shot-video-question-answer-on-next-gqa | Mistral (7B) | Acc@GQA: 9.2 |
| zero-shot-video-question-answer-on-next-qa | Mistral (7B) | Accuracy: 51.1 |