Mathematical Reasoning On Aime24

评估指标

Acc

评测结果

各个模型在此基准测试上的表现结果

Paper TitleRepository
DeepSeek-r179.8DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
Openai-o174.4--
Openai-o1-mini70.0--
Search-o156.7Search-o1: Agentic Search-Enhanced Large Reasoning Models
s1-32B56.7s1: Simple test-time scaling
Openai-o1-preview44.6--
Qwen2.5-72B-Instruct23.3Qwen2.5 Technical Report
Claude3.5-Sonnet16--
0 of 8 row(s) selected.
Mathematical Reasoning On Aime24 | SOTA | HyperAI超神经