8 months ago

Abstract

We introduce TestCase-Eval, a new benchmark for systematic evaluation of LLMsin test-case generation. TestCase-Eval includes 500 algorithm problems and100,000 human-crafted solutions from the Codeforces platform. It focuses on twopivotal tasks: (1) Fault Coverage, which measures how well LLM-generated testsets probe diverse input scenarios and cover a wide range of potential failuremodes. (2) Fault Exposure, which evaluates whether LLMs can craft a tailoredtest input that reveals a specific incorrect code implementation. We provide acomprehensive assessment of 19 state-of-the-art open-source and proprietaryLLMs on TestCase-Eval, offering insights into their strengths and limitationsin generating effective test cases for algorithm problems.

Source PDF View Code

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

8 months ago

Natural Language Processing

Task/Problem

Zheyuan Yang Zexi Kuang Xue Xia Yilun Zhao

Abstract

Source PDF View Code

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

8 months ago

Natural Language Processing

Task/Problem

Zheyuan Yang Zexi Kuang Xue Xia Yilun Zhao

Abstract

Source PDF View Code

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Can LLMs Generate High-Quality Test Cases for Algorithm Problems? TestCase-Eval: A Systematic Evaluation of Fault Coverage and Exposure | Papers | HyperAI

Command Palette

Can LLMs Generate High-Quality Test Cases for Algorithm Problems? TestCase-Eval: A Systematic Evaluation of Fault Coverage and Exposure

Zheyuan Yang Zexi Kuang Xue Xia Yilun Zhao

Abstract

Build AI with AI

HyperAI Newsletters

Command Palette

Can LLMs Generate High-Quality Test Cases for Algorithm Problems? TestCase-Eval: A Systematic Evaluation of Fault Coverage and Exposure

Zheyuan Yang Zexi Kuang Xue Xia Yilun Zhao

Abstract

Build AI with AI

HyperAI Newsletters

Command Palette

Can LLMs Generate High-Quality Test Cases for Algorithm Problems? TestCase-Eval: A Systematic Evaluation of Fault Coverage and Exposure

Zheyuan Yang Zexi Kuang Xue Xia Yilun Zhao

Abstract

Build AI with AI

HyperAI Newsletters