HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

PECC: Problem Extraction and Coding Challenges

Patrick Haller Jonas Golde Alan Akbik

PECC: Problem Extraction and Coding Challenges

Abstract

Recent advancements in large language models (LLMs) have showcased their exceptional abilities across various tasks, such as code generation, problem-solving and reasoning. Existing benchmarks evaluate tasks in isolation, yet the extent to which LLMs can understand prose-style tasks, identify the underlying problems, and then generate appropriate code solutions is still unexplored. Addressing this gap, we introduce PECC, a novel benchmark derived from Advent Of Code (AoC) challenges and Project Euler, including 2396 problems. Unlike conventional benchmarks, PECC requires LLMs to interpret narrative-embedded problems, extract requirements, and generate executable code. A key feature of our dataset is the complexity added by natural language prompting in chat-based evaluations, mirroring real-world instruction ambiguities. Results show varying model performance between narrative and neutral problems, with specific challenges in the Euler math-based subset with GPT-3.5-Turbo passing 50% of the AoC challenges and only 8% on the Euler problems. By probing the limits of LLMs' capabilities, our benchmark provides a framework to monitor and assess the subsequent progress of LLMs as a universal problem solver.

Code Repositories

hallerpatrick/pecc
Official
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
code-generation-on-peccLlama-3-8B-Instruct
Pass@3: 3.1
code-generation-on-peccClaude 3 Haiku
Pass@3: 27.67
code-generation-on-peccchat-bison
Pass@3: 8.48
code-generation-on-peccGPT-3.5 Turbo
Pass@3: 23.75
code-generation-on-peccWizardLM-2-7B
Pass@3: 3.72
code-generation-on-peccMixtral-8x7B-Instruct
Pass@3: 8.35
code-generation-on-pecccodechat-bison
Pass@3: 11.39
code-generation-on-peccPhi-3-mini-128k-instruct
Pass@3: 7.18

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
PECC: Problem Extraction and Coding Challenges | Papers | HyperAI