6 months ago

Yuchen Fan Kaiyan Zhang Heng Zhou Yuxin Zuo Yanxu Chen Yu Fu Xinwei Long Xuekai Zhu Che Jiang Yuchen Zhang

Abstract

We investigate the potential of large language models (LLMs) to serve as efficient simulators for agentic search tasks in reinforcement learning (RL), thereby reducing dependence on costly interactions with external search engines. To this end, we first quantify the intrinsic search capability of LLMs via structured prompting and repeated sampling, which we term Self-Search. Our results reveal that LLMs exhibit strong scaling behavior with respect to the inference budget, achieving high pass@k on question-answering benchmarks, including the challenging BrowseComp task. Building on these observations, we introduce Self-Search RL (SSRL), which enhances LLMs' Self-Search capability through format-based and rule-based rewards. SSRL enables models to iteratively refine their knowledge utilization internally, without requiring access to external tools. Empirical evaluations demonstrate that SSRL-trained policy models provide a cost-effective and stable environment for search-driven RL training, reducing reliance on external search engines and facilitating robust sim-to-real transfer. We draw the following conclusions: 1) LLMs possess world knowledge that can be effectively elicited to achieve high performance; 2) SSRL demonstrates the potential of leveraging internal knowledge to reduce hallucination; 3) SSRL-trained models integrate seamlessly with external search engines without additional effort. Our findings highlight the potential of LLMs to support more scalable RL agent training.

Source PDF

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

HyperAI

6 months ago

Reinforcement Learning

LLM

Agent

Method/Architecture

Yuchen Fan Kaiyan Zhang Heng Zhou Yuxin Zuo Yanxu Chen Yu Fu Xinwei Long Xuekai Zhu Che Jiang Yuchen Zhang

Abstract

Source PDF

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

HyperAI

6 months ago

Reinforcement Learning

LLM

Agent

Method/Architecture

Yuchen Fan Kaiyan Zhang Heng Zhou Yuxin Zuo Yanxu Chen Yu Fu Xinwei Long Xuekai Zhu Che Jiang Yuchen Zhang

Abstract

Source PDF

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Command Palette

SSRL: Self-Search Reinforcement Learning

Yuchen Fan Kaiyan Zhang Heng Zhou Yuxin Zuo Yanxu Chen Yu Fu Xinwei Long Xuekai Zhu Che Jiang Yuchen Zhang8 more

Abstract

Build AI with AI

HyperAI Newsletters

Command Palette

SSRL: Self-Search Reinforcement Learning

Yuchen Fan Kaiyan Zhang Heng Zhou Yuxin Zuo Yanxu Chen Yu Fu Xinwei Long Xuekai Zhu Che Jiang Yuchen Zhang8 more

Abstract

Build AI with AI

HyperAI Newsletters

Command Palette

SSRL: Self-Search Reinforcement Learning

Yuchen Fan Kaiyan Zhang Heng Zhou Yuxin Zuo Yanxu Chen Yu Fu Xinwei Long Xuekai Zhu Che Jiang Yuchen Zhang8 more

Abstract

Build AI with AI

HyperAI Newsletters

Yuchen Fan Kaiyan Zhang Heng Zhou Yuxin Zuo Yanxu Chen Yu Fu Xinwei Long Xuekai Zhu Che Jiang Yuchen Zhang

Yuchen Fan Kaiyan Zhang Heng Zhou Yuxin Zuo Yanxu Chen Yu Fu Xinwei Long Xuekai Zhu Che Jiang Yuchen Zhang

Yuchen Fan Kaiyan Zhang Heng Zhou Yuxin Zuo Yanxu Chen Yu Fu Xinwei Long Xuekai Zhu Che Jiang Yuchen Zhang