HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

MathCoder: Seamless Code Integration in LLMs for Enhanced Mathematical Reasoning

Ke Wang Houxing Ren Aojun Zhou Zimu Lu Sichun Luo Weikang Shi Renrui Zhang Linqi Song Mingjie Zhan Hongsheng Li

MathCoder: Seamless Code Integration in LLMs for Enhanced Mathematical Reasoning

Abstract

The recently released GPT-4 Code Interpreter has demonstrated remarkable proficiency in solving challenging math problems, primarily attributed to its ability to seamlessly reason with natural language, generate code, execute code, and continue reasoning based on the execution output. In this paper, we present a method to fine-tune open-source language models, enabling them to use code for modeling and deriving math equations and, consequently, enhancing their mathematical reasoning abilities. We propose a method of generating novel and high-quality datasets with math problems and their code-based solutions, referred to as MathCodeInstruct. Each solution interleaves natural language, code, and execution results. We also introduce a customized supervised fine-tuning and inference approach. This approach yields the MathCoder models, a family of models capable of generating code-based solutions for solving challenging math problems. Impressively, the MathCoder models achieve state-of-the-art scores among open-source LLMs on the MATH (45.2%) and GSM8K (83.9%) datasets, substantially outperforming other open-source alternatives. Notably, the MathCoder model not only surpasses ChatGPT-3.5 and PaLM-2 on GSM8K and MATH but also outperforms GPT-4 on the competition-level MATH dataset. The dataset and models will be released at https://github.com/mathllm/MathCoder.

Code Repositories

mathllm/mathcoder
Official
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
arithmetic-reasoning-on-gsm8kMathCoder-CL-13B
Accuracy: 74.1
Parameters (Billion): 7
arithmetic-reasoning-on-gsm8kMathCoder-CL-7B
Accuracy: 67.8
Parameters (Billion): 7
arithmetic-reasoning-on-gsm8kMathCoder-L-13B
Accuracy: 72.6
Parameters (Billion): 13
arithmetic-reasoning-on-gsm8kMathCoder-L-70B
Accuracy: 83.9
Parameters (Billion): 70
arithmetic-reasoning-on-gsm8kMathCoder-CL-34B
Accuracy: 81.7
Parameters (Billion): 34
arithmetic-reasoning-on-gsm8kMathCoder-L-7B
Accuracy: 64.2
Parameters (Billion): 7
math-word-problem-solving-on-mathMathCoder-CL-7B
Accuracy: 30.2
Parameters (Billions): 7
math-word-problem-solving-on-mathMathCoder-CL-34B
Accuracy: 45.2
Parameters (Billions): 34
math-word-problem-solving-on-mathMathCoder-CL-13B
Accuracy: 35.9
Parameters (Billions): 13
math-word-problem-solving-on-mathMathCoder-L-34B
Accuracy: 45.1
Parameters (Billions): 34
math-word-problem-solving-on-mathMathCoder-L-7B
Accuracy: 23.3
Parameters (Billions): 7
math-word-problem-solving-on-mathMathCoder-L-13B
Accuracy: 29.9
Parameters (Billions): 13
math-word-problem-solving-on-svampMathCoder-L-70B
Execution Accuracy: 84.9

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
MathCoder: Seamless Code Integration in LLMs for Enhanced Mathematical Reasoning | Papers | HyperAI