HyperAIHyperAI

Command Palette

Search for a command to run...

PromptCoT-2.0-SFT-4.8M Supervised fine-tuning Prompt SFT Dataset

Date

19 days ago

Size

21.79 GB

Organization

The University of Hong Kong
Ant Group

Paper URL

2509.19894

License

MIT

PromptCoT-2.0-SFT-4.8M is a large-scale synthetic prompt dataset released by the research team of the University of Hong Kong and Ant Group in 2025. The related paper results are "PromptCoT 2.0: Scaling Prompt Synthesis for Large Language Model Reasoning", which aims to provide high-quality reasoning prompt corpus for large language models for fine-tuning or self-training.

The dataset contains approximately 4.8 million fully synthetic prompts with reasoning trajectories in both supervised fine-tuning and self-practice scenarios, covering two major reasoning areas: mathematics and programming.

Data composition:

  • In the supervised fine-tuning (SFT) scenario, a total of 4,766,890 prompts were synthesized, including:
    • 1,188,505 programming task prompts
    • 3,578,385 math task prompts

PromptCoT-2.0-SFT-4.8M.torrent
Seeding 1Downloading 2Completed 16Total Downloads 25
  • PromptCoT-2.0-SFT-4.8M/
    • README.md
      1.53 KB
    • README.txt
      3.06 KB
      • data/
        • PromptCoT-2.0-SFT-4.8M.zip
          21.79 GB

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
PromptCoT-2.0-SFT-4.8M Supervised fine-tuning Prompt SFT Dataset | Datasets | HyperAI