7 months ago

Xingxuan Li Yao Xiao Dianwen Ng Hai Ye Yue Deng Xiang Lin Bin Wang Zhanfeng Mo Chong Zhang Yueyi Zhang

Abstract

Large language models have recently evolved from fluent text generation toadvanced reasoning across diverse domains, giving rise to reasoning languagemodels. Among these domains, mathematical reasoning serves as a representativebenchmark as it requires precise multi-step logic and abstract reasoning, whichcan be generalized to other tasks. While closed-source RLMs such as GPT-o3demonstrate impressive reasoning capabilities, their proprietary nature limitstransparency and reproducibility. Although many open-source projects aim toclose this gap, most of them lack sufficient openness by omitting criticalresources such as datasets and detailed training configurations, which hindersreproducibility. To contribute toward greater transparency in RLM development,we introduce the MiroMind-M1 series, a set of fully open-source RLMs built onthe Qwen-2.5 backbone that match or exceed the performance of existingopen-source RLMs. Specifically, our models are trained in two stages: SFT on acarefully curated corpus of 719K math-reasoning problems with verified CoTtrajectories, followed by RLVR on 62K challenging and verifiable problems. Toenhance the robustness and efficiency of the RLVR process, we introduceContext-Aware Multi-Stage Policy Optimization, an algorithm that integrateslength-progressive training with an adaptive repetition penalty to encouragecontext-aware RL training. Our model achieves state-of-the-art or competitiveperformance and superior token efficiency among Qwen-2.5-based open-source 7Band 32B models on the AIME24, AIME25, and MATH benchmarks. To facilitatereproducibility, we release the complete stack: models (MiroMind-M1-SFT-7B,MiroMind-M1-RL-7B, MiroMind-M1-RL-32B); datasets (MiroMind-M1-SFT-719K,MiroMind-M1-RL-62K); and all training and evaluation configurations. We hopethese resources will support further research and foster community advancement.

Source PDF

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

HyperAI

7 months ago

Supervised Fine-Tuning

Xingxuan Li Yao Xiao Dianwen Ng Hai Ye Yue Deng Xiang Lin Bin Wang Zhanfeng Mo Chong Zhang Yueyi Zhang

Abstract

Source PDF

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

HyperAI

7 months ago

Supervised Fine-Tuning

Xingxuan Li Yao Xiao Dianwen Ng Hai Ye Yue Deng Xiang Lin Bin Wang Zhanfeng Mo Chong Zhang Yueyi Zhang

Abstract

Source PDF

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Command Palette

MiroMind-M1: An Open-Source Advancement in Mathematical Reasoning via Context-Aware Multi-Stage Policy Optimization

Xingxuan Li Yao Xiao Dianwen Ng Hai Ye Yue Deng Xiang Lin Bin Wang Zhanfeng Mo Chong Zhang Yueyi Zhang8 more

Abstract

Build AI with AI

HyperAI Newsletters

Command Palette

MiroMind-M1: An Open-Source Advancement in Mathematical Reasoning via Context-Aware Multi-Stage Policy Optimization

Xingxuan Li Yao Xiao Dianwen Ng Hai Ye Yue Deng Xiang Lin Bin Wang Zhanfeng Mo Chong Zhang Yueyi Zhang8 more

Abstract

Build AI with AI

HyperAI Newsletters

Command Palette

MiroMind-M1: An Open-Source Advancement in Mathematical Reasoning via Context-Aware Multi-Stage Policy Optimization

Xingxuan Li Yao Xiao Dianwen Ng Hai Ye Yue Deng Xiang Lin Bin Wang Zhanfeng Mo Chong Zhang Yueyi Zhang8 more

Abstract

Build AI with AI

HyperAI Newsletters

Xingxuan Li Yao Xiao Dianwen Ng Hai Ye Yue Deng Xiang Lin Bin Wang Zhanfeng Mo Chong Zhang Yueyi Zhang

Xingxuan Li Yao Xiao Dianwen Ng Hai Ye Yue Deng Xiang Lin Bin Wang Zhanfeng Mo Chong Zhang Yueyi Zhang

Xingxuan Li Yao Xiao Dianwen Ng Hai Ye Yue Deng Xiang Lin Bin Wang Zhanfeng Mo Chong Zhang Yueyi Zhang