Date

a year ago

Size

47.22 GB

Organization

Publish URL

Paper URL

Tags

Natural Language Processing

Reasoning

Model Training

AM-DeepSeek-R1-Distilled-1.4M is a large-scale general reasoning task dataset released by am-team in March 2025. The related paper results are "1.4 Million Open-Source Distilled Reasoning Dataset to Empower Large Language Model Training". The dataset contains about 1.4 million data entries, covering various types of questions such as mathematics, code, scientific Q&A, and general chat. The data has been carefully selected, semantically deduplicated, and strictly cleaned to ensure the high quality and challenge of the data. Each entry in the dataset contains rich thinking traces, which not only provide examples of the reasoning process for the model, but also help the model better understand and generate complex reasoning task solutions. The release of the AM-DeepSeek-R1-Distilled-1.4M dataset aims to provide a powerful tool for the field of natural language processing and reasoning tasks, especially for training and optimizing the reasoning capabilities of large language models. It can help models improve their performance in key areas such as mathematics, code, and scientific Q&A, so as to better cope with various complex reasoning tasks.

AM-DeepSeek-R1-Distilled-1.4M.torrent

Seeding 1Downloading 0Completed 99Total Downloads 233

AM-DeepSeek-R1-Distilled-1.4M/
- README.md
  1.8 KB
- README.txt
  3.6 KB

This dataset is contributed by community users and is intended for educational and informational purposes only. If any content involves copyright infringement, please contact us at support@hyper.ai for prompt review and removal.