Automated Theorem Proving On Minif2F Test

评估指标

ITP
Pass@32
cumulative

评测结果

各个模型在此基准测试上的表现结果

Paper TitleRepository
Subgoal-XLIsabelle39.356.1SubgoalXL: Subgoal-based Expert Learning for Theorem Proving
EvaristeLean-41HyperTree Proof Search for Neural Theorem Proving-
Lean GPT-fLean29.229.2MiniF2F: a cross-system benchmark for formal Olympiad-level mathematics
Lean Expert IterationLean34.536.6Formal Mathematics Statement Curriculum Learning
GPT-fMetamath-36.6HyperTree Proof Search for Neural Theorem Proving-
Evariste-1dLean-38.9HyperTree Proof Search for Neural Theorem Proving-
LLEMMA-7bLean26.226.2Llemma: An Open Language Model For Mathematics
Decomposing the EnigmaIsabelle-45.5Decomposing the Enigma: Subgoal-based Demonstration Learning for Formal Theorem Proving
COPRA + GPT-3.5Lean-11.9An In-Context Learning Agent for Formal Theorem-Proving
MMOS-DeepSeekMath-7BLean-28.3An Empirical Study of Data Ability Boundary in LLMs' Math Reasoning
DeepSeek-Prover-V1.5Lean50.063.5DeepSeek-Prover-V1.5: Harnessing Proof Assistant Feedback for Reinforcement Learning and Monte-Carlo Tree Search
Lean tidyLean-18MiniF2F: a cross-system benchmark for formal Olympiad-level mathematics
Metamath GPT-fMetamath-1.6MiniF2F: a cross-system benchmark for formal Olympiad-level mathematics
COPRA + GPT-4Lean-23.3An In-Context Learning Agent for Formal Theorem-Proving
Thor + expert iteration on autoformalised theoremsIsabelle-35.2--
LLEMMA-34bLean25.825.8Llemma: An Open Language Model For Mathematics
ReProverLean26.526.5--
DeepSeek-ProverLean-52.0DeepSeek-Prover: Advancing Theorem Proving in LLMs through Large-Scale Synthetic Data-
SledgehammerIsabelle-10.4Thor: Wielding Hammers to Integrate Language Models and Automated Theorem Provers-
DSP (540B Minerva informal)Isabelle-38.9Draft, Sketch, and Prove: Guiding Formal Theorem Provers with Informal Proofs
0 of 28 row(s) selected.
Automated Theorem Proving On Minif2F Test | SOTA | HyperAI超神经