3 个月前

OpenMathInstruct-1:一个包含180万条数学指令微调数据的语料库

OpenMathInstruct-1:一个包含180万条数学指令微调数据的语料库

摘要

近期研究显示,合成生成的数据集在训练大型语言模型(LLMs)方面具有巨大潜力,尤其在获取特定技能方面表现突出。当前大规模数学指令微调数据集,如MetaMathQA(Yu等,2024)和MAmmoTH(Yue等,2024),均基于闭源大模型在商业限制性许可下的输出构建。制约开源大模型在这些数据生成流程中广泛应用的关键因素,在于顶尖闭源模型(如GPT-4)与当前最优开源模型之间在数学能力上存在的显著差距。基于开源大模型的最新进展、我们提出的新型提示工程方法以及一定的暴力扩展策略,我们构建了OpenMathInstruct-1——一个包含180万组问题-解答对的数学指令微调数据集。该数据集通过合成代码解释器解决方案,针对GSM8K和MATH两个主流数学推理基准,利用近期发布且采用宽松许可协议的Mixtral模型生成。我们训练的最优模型OpenMath-CodeLlama-70B,在OpenMathInstruct-1子集上进行微调后,在GSM8K上取得84.6%的准确率,在MATH上达到50.7%,性能与最佳的GPT蒸馏模型相当。我们已将代码、模型及OpenMathInstruct-1数据集以商业友好型许可协议开源发布。

代码仓库

kipok/nemo-skills
官方
GitHub 中提及

基准测试

基准方法指标
arithmetic-reasoning-on-gsm8kOpenMath-CodeLlama-7B (w/ code)
Accuracy: 75.9
Parameters (Billion): 7
arithmetic-reasoning-on-gsm8kOpenMath-CodeLlama-13B (w/ code, SC, k=50)
Accuracy: 86.8
Parameters (Billion): 13
arithmetic-reasoning-on-gsm8kOpenMath-CodeLlama-70B (w/ code, SC, k=50)
Accuracy: 90.8
Parameters (Billion): 70
arithmetic-reasoning-on-gsm8kOpenMath-CodeLlama-34B (w/ code)
Accuracy: 80.7
Parameters (Billion): 34
arithmetic-reasoning-on-gsm8kOpenMath-CodeLlama-70B (w/ code)
Accuracy: 84.6
Parameters (Billion): 70
arithmetic-reasoning-on-gsm8kOpenMath-CodeLlama-13B (w/ code)
Accuracy: 78.8
Parameters (Billion): 13
arithmetic-reasoning-on-gsm8kOpenMath-Llama2-70B (w/ code, SC, k=50)
Accuracy: 90.1
Parameters (Billion): 70
arithmetic-reasoning-on-gsm8kOpenMath-CodeLlama-7B (w/ code, SC, k=50)
Accuracy: 84.8
Parameters (Billion): 7
arithmetic-reasoning-on-gsm8kOpenMath-Mistral-7B (w/ code)
Accuracy: 80.2
Parameters (Billion): 7
arithmetic-reasoning-on-gsm8kOpenMath-Llama2-70B (w/ code)
Accuracy: 84.7
Parameters (Billion): 70
arithmetic-reasoning-on-gsm8kOpenMath-Mistral-7B (w/ code, SC, k=50)
Accuracy: 86.9
Parameters (Billion): 7
arithmetic-reasoning-on-gsm8kOpenMath-CodeLlama-34B (w/ code, SC, k=50)
Accuracy: 88.0
Parameters (Billion): 34
math-word-problem-solving-on-asdiv-aOpenMath-CodeLlama-70B (w/ code)
Execution Accuracy: 84.7
math-word-problem-solving-on-mathOpenMath-CodeLlama-7B (w/ code)
Accuracy: 43.6
Parameters (Billions): 7
math-word-problem-solving-on-mathOpenMath-CodeLlama-34B (w/ code)
Accuracy: 48.3
Parameters (Billions): 34
math-word-problem-solving-on-mathOpenMath-CodeLlama-70B (w/ code, SC, k=50)
Accuracy: 60.4
Parameters (Billions): 70
math-word-problem-solving-on-mathOpenMath-CodeLlama-13B (w/ code, SC, k=50)
Accuracy: 57.6
Parameters (Billions): 13
math-word-problem-solving-on-mathOpenMath-Llama2-70B (w/ code, SC, k=50)
Accuracy: 58.3
Parameters (Billions): 70
math-word-problem-solving-on-mathOpenMath-CodeLlama-13B (w/ code)
Accuracy: 45.5
Parameters (Billions): 13
math-word-problem-solving-on-mathOpenMath-Mistral-7B (w/ code, SC, k=50)
Accuracy: 57.2
Parameters (Billions): 7
math-word-problem-solving-on-mathOpenMath-Mistral-7B (w/ code)
Accuracy: 44.5
Parameters (Billions): 7
math-word-problem-solving-on-mathOpenMath-CodeLlama-7B (w/ code, SC, k=50)
Accuracy: 55.6
Parameters (Billions): 7
math-word-problem-solving-on-mathOpenMath-CodeLlama-70B (w/ code)
Accuracy: 50.7
Parameters (Billions): 70
math-word-problem-solving-on-mathOpenMath-CodeLlama-34B (w/ code, SC, k=50)
Accuracy: 60.2
Parameters (Billions): 34
math-word-problem-solving-on-mathOpenMath-Llama2-70B (w/ code)
Accuracy: 46.3
Parameters (Billions): 70
math-word-problem-solving-on-mawpsOpenMath-CodeLlama-70B (w/ code)
Accuracy (%): 95.7
math-word-problem-solving-on-svampOpenMath-CodeLlama-70B (w/ code)
Execution Accuracy: 87.8

用 AI 构建 AI

从想法到上线——通过免费 AI 协同编程、开箱即用的环境和市场最优价格的 GPU 加速您的 AI 开发

AI 协同编程
即用型 GPU
最优价格
立即开始

Hyper Newsletters

订阅我们的最新资讯
我们会在北京时间 每周一的上午九点 向您的邮箱投递本周内的最新更新
邮件发送服务由 MailChimp 提供
OpenMathInstruct-1:一个包含180万条数学指令微调数据的语料库 | 论文 | HyperAI超神经