HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

MiroMind-M1: An Open-Source Advancement in Mathematical Reasoning via Context-Aware Multi-Stage Policy Optimization

MiroMind-M1: An Open-Source Advancement in Mathematical Reasoning via
  Context-Aware Multi-Stage Policy Optimization

Abstract

Large language models have recently evolved from fluent text generation toadvanced reasoning across diverse domains, giving rise to reasoning languagemodels. Among these domains, mathematical reasoning serves as a representativebenchmark as it requires precise multi-step logic and abstract reasoning, whichcan be generalized to other tasks. While closed-source RLMs such as GPT-o3demonstrate impressive reasoning capabilities, their proprietary nature limitstransparency and reproducibility. Although many open-source projects aim toclose this gap, most of them lack sufficient openness by omitting criticalresources such as datasets and detailed training configurations, which hindersreproducibility. To contribute toward greater transparency in RLM development,we introduce the MiroMind-M1 series, a set of fully open-source RLMs built onthe Qwen-2.5 backbone that match or exceed the performance of existingopen-source RLMs. Specifically, our models are trained in two stages: SFT on acarefully curated corpus of 719K math-reasoning problems with verified CoTtrajectories, followed by RLVR on 62K challenging and verifiable problems. Toenhance the robustness and efficiency of the RLVR process, we introduceContext-Aware Multi-Stage Policy Optimization, an algorithm that integrateslength-progressive training with an adaptive repetition penalty to encouragecontext-aware RL training. Our model achieves state-of-the-art or competitiveperformance and superior token efficiency among Qwen-2.5-based open-source 7Band 32B models on the AIME24, AIME25, and MATH benchmarks. To facilitatereproducibility, we release the complete stack: models (MiroMind-M1-SFT-7B,MiroMind-M1-RL-7B, MiroMind-M1-RL-32B); datasets (MiroMind-M1-SFT-719K,MiroMind-M1-RL-62K); and all training and evaluation configurations. We hopethese resources will support further research and foster community advancement.

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
MiroMind-M1: An Open-Source Advancement in Mathematical Reasoning via Context-Aware Multi-Stage Policy Optimization | Papers | HyperAI