7 months ago

Zixiao Wang Yuxin Wang Xiaorui Wang Mengting Xing Jie Gao Jianjun Xu Guangcan Liu Chenhui Jin Zhuo Wang Shengzhuo Zhang

Abstract

We introduce our first reflective generative model MetaStone-S1, which obtains OpenAI o3's performance via the self-supervised process reward model (SPRM). Through sharing the backbone network and using task-specific heads for next token prediction and process scoring respectively, SPRM successfully integrates the policy model and process reward model(PRM) into a unified interface without extra process annotation, reducing over 99% PRM parameters for efficient reasoning. Equipped with SPRM, MetaStone-S1 is naturally suitable for test time scaling (TTS), and we provide three reasoning effort modes (low, medium, and high), based on the controllable thinking length. Moreover, we empirically establish a scaling law that reveals the relationship between total thinking computation and TTS performance. Experiments demonstrate that our MetaStone-S1 achieves comparable performance to OpenAI-o3-mini's series with only 32B parameter size. To support the research community, we have open-sourced MetaStone-S1 at https://github.com/MetaStone-AI/MetaStone-S1.

Source PDF View Code

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

HyperAI

7 months ago

Reasoning

Transformer

Supervised Fine-Tuning

Method/Architecture

Zixiao Wang Yuxin Wang Xiaorui Wang Mengting Xing Jie Gao Jianjun Xu Guangcan Liu Chenhui Jin Zhuo Wang Shengzhuo Zhang

Abstract

Source PDF View Code

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

HyperAI

7 months ago

Reasoning

Transformer

Supervised Fine-Tuning

Method/Architecture

Zixiao Wang Yuxin Wang Xiaorui Wang Mengting Xing Jie Gao Jianjun Xu Guangcan Liu Chenhui Jin Zhuo Wang Shengzhuo Zhang

Abstract

Source PDF View Code

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Command Palette

Test-Time Scaling with Reflective Generative Model

Zixiao Wang Yuxin Wang Xiaorui Wang Mengting Xing Jie Gao Jianjun Xu Guangcan Liu Chenhui Jin Zhuo Wang Shengzhuo Zhang1 more

Abstract

Build AI with AI

HyperAI Newsletters

Command Palette

Test-Time Scaling with Reflective Generative Model

Zixiao Wang Yuxin Wang Xiaorui Wang Mengting Xing Jie Gao Jianjun Xu Guangcan Liu Chenhui Jin Zhuo Wang Shengzhuo Zhang1 more

Abstract

Build AI with AI

HyperAI Newsletters

Command Palette

Test-Time Scaling with Reflective Generative Model

Zixiao Wang Yuxin Wang Xiaorui Wang Mengting Xing Jie Gao Jianjun Xu Guangcan Liu Chenhui Jin Zhuo Wang Shengzhuo Zhang1 more

Abstract

Build AI with AI

HyperAI Newsletters

Zixiao Wang Yuxin Wang Xiaorui Wang Mengting Xing Jie Gao Jianjun Xu Guangcan Liu Chenhui Jin Zhuo Wang Shengzhuo Zhang

Zixiao Wang Yuxin Wang Xiaorui Wang Mengting Xing Jie Gao Jianjun Xu Guangcan Liu Chenhui Jin Zhuo Wang Shengzhuo Zhang

Zixiao Wang Yuxin Wang Xiaorui Wang Mengting Xing Jie Gao Jianjun Xu Guangcan Liu Chenhui Jin Zhuo Wang Shengzhuo Zhang