HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

TACO: Topics in Algorithmic COde generation dataset

Rongao Li Jie Fu Bo-Wen Zhang Tao Huang Zhihong Sun Chen Lyu Guang Liu Zhi Jin Ge Li

TACO: Topics in Algorithmic COde generation dataset

Abstract

We introduce TACO, an open-source, large-scale code generation dataset, with a focus on the optics of algorithms, designed to provide a more challenging training dataset and evaluation benchmark in the field of code generation models. TACO includes competition-level programming questions that are more challenging, to enhance or evaluate problem understanding and reasoning abilities in real-world programming scenarios. There are 25433 and 1000 coding problems in training and test set, as well as up to 1.55 million diverse solution answers. Moreover, each TACO problem includes several fine-grained labels such as task topics, algorithms, programming skills, and difficulty levels, providing a more precise reference for the training and evaluation of code generation models. The dataset and evaluation scripts are available on Hugging Face Hub (https://huggingface.co/datasets/BAAI/TACO) and Github (https://github.com/FlagOpen/TACO).

Code Repositories

flagopen/taco
Official
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
code-generation-on-taco-codeCodeLlama-7B-Python
easy pass@1: 9.32%
code-generation-on-taco-codeStarcoder-15.5B
easy pass@1: 11.6%
code-generation-on-taco-codeGPT-4
easy pass@1: 31.50%

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
TACO: Topics in Algorithmic COde generation dataset | Papers | HyperAI