HyperAIHyperAI

Command Palette

Search for a command to run...

5 months ago

MiniMax-M1: Scaling Test-Time Compute Efficiently with Lightning Attention

MiniMax-M1: Scaling Test-Time Compute Efficiently with Lightning
  Attention

Abstract

We introduce MiniMax-M1, the world's first open-weight, large-scalehybrid-attention reasoning model. MiniMax-M1 is powered by a hybridMixture-of-Experts (MoE) architecture combined with a lightning attentionmechanism. The model is developed based on our previous MiniMax-Text-01 model,which contains a total of 456 billion parameters with 45.9 billion parametersactivated per token. The M1 model natively supports a context length of 1million tokens, 8x the context size of DeepSeek R1. Furthermore, the lightningattention mechanism in MiniMax-M1 enables efficient scaling of test-timecompute. These properties make M1 particularly suitable for complex tasks thatrequire processing long inputs and thinking extensively. MiniMax-M1 is trainedusing large-scale reinforcement learning (RL) on diverse problems includingsandbox-based, real-world software engineering environments. In addition toM1's inherent efficiency advantage for RL training, we propose CISPO, a novelRL algorithm to further enhance RL efficiency. CISPO clips importance samplingweights rather than token updates, outperforming other competitive RL variants.Combining hybrid-attention and CISPO enables MiniMax-M1's full RL training on512 H800 GPUs to complete in only three weeks, with a rental cost of just$534,700. We release two versions of MiniMax-M1 models with 40K and 80Kthinking budgets respectively, where the 40K model represents an intermediatephase of the 80K training. Experiments on standard benchmarks show that ourmodels are comparable or superior to strong open-weight models such as theoriginal DeepSeek-R1 and Qwen3-235B, with particular strengths in complexsoftware engineering, tool utilization, and long-context tasks. We publiclyrelease MiniMax-M1 at https://github.com/MiniMax-AI/MiniMax-M1.

Code Repositories

minimax-ai/minimax-m1
Official
pytorch
Mentioned in GitHub

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
MiniMax-M1: Scaling Test-Time Compute Efficiently with Lightning Attention | Papers | HyperAI