HyperAIHyperAI

Command Palette

Search for a command to run...

2 months ago

rStar2-Agent: Agentic Reasoning Technical Report

rStar2-Agent: Agentic Reasoning Technical Report

Abstract

We introduce rStar2-Agent, a 14B math reasoning model trained with agenticreinforcement learning to achieve frontier-level performance. Beyond currentlong CoT, the model demonstrates advanced cognitive behaviors, such as thinkingcarefully before using Python coding tools and reflecting on code executionfeedback to autonomously explore, verify, and refine intermediate steps incomplex problem-solving. This capability is enabled through three keyinnovations that makes agentic RL effective at scale: (i) an efficient RLinfrastructure with a reliable Python code environment that supportshigh-throughput execution and mitigates the high rollout costs, enablingtraining on limited GPU resources (64 MI300X GPUs); (ii) GRPO-RoC, an agenticRL algorithm with a Resample-on-Correct rollout strategy that addresses theinherent environment noises from coding tools, allowing the model to reasonmore effectively in a code environment; (iii) An efficient agent trainingrecipe that starts with non-reasoning SFT and progresses through multi-RLstages, yielding advanced cognitive abilities with minimal compute cost. Tothis end, rStar2-Agent boosts a pre-trained 14B model to state of the art inonly 510 RL steps within one week, achieving average pass@1 scores of 80.6% onAIME24 and 69.8% on AIME25, surpassing DeepSeek-R1 (671B) with significantlyshorter responses. Beyond mathematics, rStar2-Agent-14B also demonstratesstrong generalization to alignment, scientific reasoning, and agentic tool-usetasks. Code and training recipes are available athttps://github.com/microsoft/rStar.

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
rStar2-Agent: Agentic Reasoning Technical Report | Papers | HyperAI