Command Palette
Search for a command to run...
MiniMind Large Model Training fine-tuning Dataset
MiniMind is an open source lightweight large language model project that aims to lower the threshold for using large language models (LLM) and enable individual users to quickly train and infer on ordinary devices.
MiniMind includes multiple datasets, such as the tokenizer training set for training word segmenters, the Pretrain data for pre-training models, the SFT data for supervised fine-tuning, and the DPO data 1 and DPO data 2 for training reward models. These datasets are integrated from different sources, such as SFT data from Jiangshu Technology, Qwen2.5 distillation data, etc., with a total of about 3B tokens, which are suitable for pre-training of large Chinese language models.
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.