Math Word Problem Solving On Math

评估指标

Accuracy

评测结果

各个模型在此基准测试上的表现结果

Paper TitleRepository
Gemini 2.0 Flash Experimental89.7--
Qwen2.5-Math-72B-Instruct(TIR,Greedy)88.1Qwen2.5-Math Technical Report: Toward Mathematical Expert Model via Self-Improvement-
GPT-4 Turbo (MACM, w/code, voting)87.920MACM: Utilizing a Multi-Agent System for Condition Mining in Solving Complex Mathematical Problems
Qwen2.5-Math-72B-Instruct(COT,Greedy)85.9Qwen2.5-Math Technical Report: Toward Mathematical Expert Model via Self-Improvement-
Qwen2.5-Math-7B-Instruct(TIR,Greedy)85.2Qwen2.5-Math Technical Report: Toward Mathematical Expert Model via Self-Improvement-
GPT-4-code model (CSV, w/ code, SC, k=16)84.3Solving Challenging Math Word Problems Using GPT-4 Code Interpreter with Code-based Self-Verification
Qwen2-Math-72B-Instruct(greedy)84.0Qwen2 Technical Report
Qwen2.5-Math-7B-Instruct(COT,Greedy)83.6Qwen2.5-Math Technical Report: Toward Mathematical Expert Model via Self-Improvement-
Qwen2.5-Math-1.5B-Instruct(TIR,Greedy)79.9Qwen2.5-Math Technical Report: Toward Mathematical Expert Model via Self-Improvement-
OpenMath2-Llama3.1-70B (majority@256)79.6OpenMathInstruct-2: Accelerating AI for Math with Massive Open-Source Instruction Data
OpenMath2-Llama3.1-8B (majority@256)76.1OpenMathInstruct-2: Accelerating AI for Math with Massive Open-Source Instruction Data
Qwen2.5-Math-1.5B-Instruct(COT,Greedy)75.8Qwen2.5-Math Technical Report: Toward Mathematical Expert Model via Self-Improvement-
GPT-4-code model (CSV, w/ code)73.5Solving Challenging Math Word Problems Using GPT-4 Code Interpreter with Code-based Self-Verification
CR (GPT-4-turbo model, w/ code)72.2Cumulative Reasoning with Large Language Models
OpenMath2-Llama3.1-70B71.9OpenMathInstruct-2: Accelerating AI for Math with Massive Open-Source Instruction Data
LogicNet (with code interpreter)71.2Solving Challenging Math Word Problems Using GPT-4 Code Interpreter with Code-based Self-Verification
Qwen2-72B-Instruct-Step-DPO (0-shot CoT, w/o code)70.8Step-DPO: Step-wise Preference Optimization for Long-chain Reasoning of LLMs
GPT-4-code model (w/ code)69.7Solving Challenging Math Word Problems Using GPT-4 Code Interpreter with Code-based Self-Verification
OpenMath2-Llama3.1-8B67.8OpenMathInstruct-2: Accelerating AI for Math with Massive Open-Source Instruction Data
AlphaMath-7B-SBS@366.3AlphaMath Almost Zero: Process Supervision without Process
0 of 135 row(s) selected.
Math Word Problem Solving On Math | SOTA | HyperAI超神经