Command Palette
Search for a command to run...
Mistral-AI Abhinav Rastogi Albert Q. Jiang Andy Lo Gabrielle Berrada Guillaume Lample et al

Abstract
We introduce Magistral, Mistral's first reasoning model and our own scalablereinforcement learning (RL) pipeline. Instead of relying on existingimplementations and RL traces distilled from prior models, we follow a groundup approach, relying solely on our own models and infrastructure. Notably, wedemonstrate a stack that enabled us to explore the limits of pure RL trainingof LLMs, present a simple method to force the reasoning language of the model,and show that RL on text data alone maintains most of the initial checkpoint'scapabilities. We find that RL on text maintains or improves multimodalunderstanding, instruction following and function calling. We present MagistralMedium, trained for reasoning on top of Mistral Medium 3 with RL alone, and weopen-source Magistral Small (Apache 2.0) which further includes cold-start datafrom Magistral Medium.
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.