HyperAIHyperAI

Command Palette

Search for a command to run...

5 months ago

LiveCodeBench Pro: How Do Olympiad Medalists Judge LLMs in Competitive Programming?

LiveCodeBench Pro: How Do Olympiad Medalists Judge LLMs in Competitive
  Programming?

Abstract

Recent reports claim that large language models (LLMs) now outperform elitehumans in competitive programming. Drawing on knowledge from a group ofmedalists in international algorithmic contests, we revisit this claim,examining how LLMs differ from human experts and where limitations stillremain. We introduce LiveCodeBench Pro, a benchmark composed of problems fromCodeforces, ICPC, and IOI that are continuously updated to reduce thelikelihood of data contamination. A team of Olympiad medalists annotates everyproblem for algorithmic categories and conducts a line-by-line analysis offailed model-generated submissions. Using this new data and benchmark, we findthat frontier models still have significant limitations: without externaltools, the best model achieves only 53% pass@1 on medium-difficulty problemsand 0% on hard problems, domains where expert humans still excel. We also findthat LLMs succeed at implementation-heavy problems but struggle with nuancedalgorithmic reasoning and complex case analysis, often generating confidentlyincorrect justifications. High performance appears largely driven byimplementation precision and tool augmentation, not superior reasoning.LiveCodeBench Pro thus highlights the significant gap to human grandmasterlevels, while offering fine-grained diagnostics to steer future improvements incode-centric LLM reasoning.

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
LiveCodeBench Pro: How Do Olympiad Medalists Judge LLMs in Competitive Programming? | Papers | HyperAI