Code Generation On Mbpp

评估指标

Accuracy

评测结果

各个模型在此基准测试上的表现结果

Paper TitleRepository
QualityFlow (Sonnet-3.5)94.2QualityFlow: An Agentic Workflow for Program Synthesis Controlled by LLM Quality Checks-
o1-mini + MapCoder93.2MapCoder: Multi-Agent Code Generation for Competitive Problem Solving
GPT-4 + AgentCoder91.8AgentCoder: Multi-Agent-based Code Generation with Iterative Testing and Optimisation
CodeSim (GPT4o)90.7CODESIM: Multi-Agent Code Generation and Problem Solving through Simulation-Driven Planning and Debugging
Jiutian-大模型90.0--
GPT-3.5 Turbo (ChatGPT) + AgentCoder89.9AgentCoder: Multi-Agent-based Code Generation with Iterative Testing and Optimisation
MapCoder (GPT-4o)89.7MapCoder: Multi-Agent Code Generation for Competitive Problem Solving
GPT-4 (ChatGPT Plus)87.5How Does Naming Affect LLMs on Code Analysis Tasks?-
Claude 3 Opus86.4The Claude 3 Model Family: Opus, Sonnet, Haiku-
LPW (GPT-4o)84.8Planning-Driven Programming: A Large Language Model Programming Workflow
GPT-3.5 Turbo + FlowGenScrum + Test83.8±0.6SOEN-101: Code Generation by Emulating Software Process Models Using Large Language Model Agents-
AFlow(GPT-4o-mini)83.4AFlow: Automating Agentic Workflow Generation
GPT-3.5 Turbo (ChatGPT)83.2How Does Naming Affect LLMs on Code Analysis Tasks?-
MapCoder (GPT-4)83.1MapCoder: Multi-Agent Code Generation for Competitive Problem Solving
o1-mini + Language Agent Tree Search (Hamming.ai)82.3Language Agent Tree Search Unifies Reasoning Acting and Planning in Language Models
GPT-4 (Bing Chat)82How Does Naming Affect LLMs on Code Analysis Tasks?-
GPT-3.5 Turbo + Language Agent Tree Search81.1Language Agent Tree Search Unifies Reasoning Acting and Planning in Language Models
MGDebugger (CodeQwen1.5)80.8From Code to Correctness: Closing the Last Mile of Code Generation with Hierarchical Debugging
Claude 3 Haiku80.4The Claude 3 Model Family: Opus, Sonnet, Haiku-
GPT-4 (Self-Debugging with unit tests + trace)80.2Teaching Large Language Models to Self-Debug
0 of 96 row(s) selected.
Code Generation On Mbpp | SOTA | HyperAI超神经