HyperAIHyperAI

Command Palette

Search for a command to run...

MiniMind Large Model Training fine-tuning Dataset

Date

8 months ago

Size

8.08 GB

Publish URL

github.com

MiniMind is an open source lightweight large language model project that aims to lower the threshold for using large language models (LLM) and enable individual users to quickly train and infer on ordinary devices.

MiniMind includes multiple datasets, such as the tokenizer training set for training word segmenters, the Pretrain data for pre-training models, the SFT data for supervised fine-tuning, and the DPO data 1 and DPO data 2 for training reward models. These datasets are integrated from different sources, such as SFT data from Jiangshu Technology, Qwen2.5 distillation data, etc., with a total of about 3B tokens, which are suitable for pre-training of large Chinese language models.

minimind_dataset.torrent
Seeding 1Downloading 0Completed 105Total Downloads 182
  • minimind_dataset/
    • README.md
      1.31 KB
    • README.txt
      2.63 KB
      • data/
        • minimind_dataset.zip
          8.08 GB

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp