HyperAIHyperAI

Command Palette

Search for a command to run...

KodCode-V1 Encoding Synthetic Dataset

Date

8 months ago

Size

1.99 GB

Organization

Microsoft
University of Washington

Paper URL

arxiv.org

License

CC BY 4.0

KodCode was released in 2025 by researchers from Microsoft GenAI, the University of Washington, and the University of Texas at Austin.KodCode: A Diverse, Challenging, and Verifiable Synthetic Dataset for Coding".

The dataset is the largest fully synthetic open-source dataset that provides verifiable solutions and tests for coding tasks. It contains 12 different subsets covering various fields (from algorithms to package-specific knowledge) and difficulty levels (from basic coding exercises to interviews and competitive programming challenges), and is designed for supervised fine-tuning (SFT) and RL tuning.

This figure illustrates the 3-step process of generating KodCode-V1: coding problem synthesis, solution and test generation, and post-training data synthesis. The final KodCode-V1 dataset contains 447K verified problem-solution-test triplets. The distribution of each subset is shown on the right.
KodCode-V1.torrent
Seeding 1Downloading 0Completed 72Total Downloads 143
  • KodCode-V1/
    • README.md
      1.61 KB
    • README.txt
      3.21 KB
      • data/
        • KodCode-V1.zip
          1.99 GB

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp