Math Word Problem Solving On Svamp

评估指标

Execution Accuracy

评测结果

各个模型在此基准测试上的表现结果

Paper TitleRepository
GPT-4 (Teaching-Inspired)93.9Teaching-Inspired Integrated Prompting Framework: A Novel Approach for Enhancing Reasoning in Large Language Models
GPT-4 (Model Selection)93.7Automatic Model Selection with Large Language Models for Reasoning
Qwen2(CoT + Code Interpreter)92.3--
GPT-4 (PHP)91.9Progressive-Hint Prompting Improves Reasoning in Large Language Models
OpenMath-CodeLlama-70B (w/ code)87.8OpenMathInstruct-1: A 1.8 Million Math Instruction Tuning Dataset
MathCoder-L-70B84.9MathCoder: Seamless Code Integration in LLMs for Enhanced Mathematical Reasoning
MMOS-CODE-34B(0-shot)80.6An Empirical Study of Data Ability Boundary in LLMs' Math Reasoning
MMOS-DeepSeekMath-7B(0-shot)79.3An Empirical Study of Data Ability Boundary in LLMs' Math Reasoning
MMOS-CODE-7B(0-shot)76.4An Empirical Study of Data Ability Boundary in LLMs' Math Reasoning
LLaMA 2-Chat69.2Llama 2: Open Foundation and Fine-Tuned Chat Models
DeBERTa63.5Math Word Problem Solving by Generating Linguistic Variants of Problem Statements
PaLM (zero-shot, CoT)62.1Large Language Models are Zero-Shot Reasoners
PaLM (zero-shot)58.8Large Language Models are Zero-Shot Reasoners
SYRELM (Vicuna 13B)56.65Frugal LMs Trained to Invoke Symbolic Solvers Achieve Parameter-Efficient Arithmetic Reasoning
ATHENA (roberta-large)54.8ATHENA: Mathematical Reasoning with Thought Expansion
MsAT-DeductReasoner48.9Learning Multi-Step Reasoning by Solving Arithmetic Tasks
Roberta-DeductReasoner47.3Learning to Reason Deductively: Math Word Problem Solving as Complex Relation Extraction
ATHENA (roberta-base)45.6ATHENA: Mathematical Reasoning with Thought Expansion
Graph2Tree with RoBERTa43.8Are NLP Models really able to Solve Simple Math Word Problems?
GTS with RoBERTa41.0Are NLP Models really able to Solve Simple Math Word Problems?
0 of 24 row(s) selected.
Math Word Problem Solving On Svamp | SOTA | HyperAI超神经