| Qwen2-Math-72B-Instruct
(greedy) | 96.7 | 72 | Qwen2 Technical Report | |
| SFT-Mistral-7B (Metamath, OVM, Smart Ensemble) | 96.4 | 7 | - | - |
| DAMOMath-7B(MetaMath, OVM, BS, Ensemble) | 95.1 | 7 | - | - |
| Claude 3 Opus (0-shot chain-of-thought) | 95 | - | The Claude 3 Model Family: Opus, Sonnet, Haiku | - |
| SFT-Mistral-7B (Metamath + ovm +ensemble) | 94.13 | 7 | - | - |
| Qwen2-72B-Instruct-Step-DPO (0-shot CoT) | 94.0 | - | Step-DPO: Step-wise Preference Optimization for Long-chain Reasoning of
LLMs | |
| DAMOMath-7B(MetaMath, OVM, Ensemble) | 93.2 | 7 | - | - |
| Claude 3 Sonnet (0-shot chain-of-thought) | 92.3 | - | The Claude 3 Model Family: Opus, Sonnet, Haiku | - |
| PaLM 2 (few-shot, k=8, SC) | 91.0 | - | PaLM 2 Technical Report | |
| GaC(Qwen2-72B-Instruct + Llama-3-70B-Instruct) | 90.91 | - | Breaking the Ceiling of the LLM Community by Treating Token Generation as a Classification for Ensembling | |
| OpenMath-CodeLlama-70B (w/ code, SC, k=50) | 90.8 | 70 | OpenMathInstruct-1: A 1.8 Million Math Instruction Tuning Dataset | |
| DART-Math-Llama3-70B-Uniform (0-shot CoT, w/o code) | 90.4 | 70 | DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving | |