2 months ago

Baichuan-M2: Scaling Medical Capability with Large Verifier System

Baichuan-M2 Team Chengfeng Dou Chong Liu Fan Yang Fei Li Jiyuan Jia Mingyang Chen Qiang Ju Shuai Wang Shunya Dang

Abstract

As large language models (LLMs) advance in conversational and reasoningcapabilities, their practical application in healthcare has become a criticalresearch focus. However, there is a notable gap between the performance ofmedical LLMs on static benchmarks such as USMLE and their utility in real-worldclinical decision-making. This discrepancy arises because traditional examsfail to capture the dynamic, interactive nature of medical consultations. Toaddress this challenge, we introduce a novel dynamic verification frameworkthat moves beyond static answer verifier, establishing a large-scale,high-fidelity interactive reinforcement learning system. Our frameworkcomprises two key components: a Patient Simulator that creates realisticclinical environments using de-identified medical records, and a ClinicalRubrics Generator that dynamically produces multi-dimensional evaluationmetrics. Building on this foundation, we develop Baichuan-M2, a 32B-parametermedical augmented reasoning model trained through a multi-stage reinforcementlearning strategy with an improved Group Relative Policy Optimization (GRPO)algorithm. Evaluated on HealthBench, Baichuan-M2 outperforms all otheropen-source models and most advanced closed-source counterparts, achieving ascore above 32 on the challenging HealthBench Hard benchmark-previouslyexceeded only by GPT-5. Our work demonstrates that robust dynamic verifiersystem is essential for aligning LLM capabilities with practical clinicalapplications, establishing a new Pareto front in the performance-parametertrade-off for medical AI deployment.

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started

Hyper Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Command Palette

Baichuan-M2: Scaling Medical Capability with Large Verifier System

Baichuan-M2 Team Chengfeng Dou Chong Liu Fan Yang Fei Li Jiyuan Jia Mingyang Chen Qiang Ju Shuai Wang Shunya Dang24 more

Abstract

Build AI with AI

Hyper Newsletters

Baichuan-M2 Team Chengfeng Dou Chong Liu Fan Yang Fei Li Jiyuan Jia Mingyang Chen Qiang Ju Shuai Wang Shunya Dang