2 months ago

AgentGym-RL: Training LLM Agents for Long-Horizon Decision Making through Multi-Turn Reinforcement Learning

Zhiheng Xi Jixuan Huang Chenyang Liao Baodai Huang Honglin Guo Jiaqi Liu Rui Zheng Junjie Ye Jiazheng Zhang Wenxiang Chen

Abstract

Developing autonomous LLM agents capable of making a series of intelligentdecisions to solve complex, real-world tasks is a fast-evolving frontier. Likehuman cognitive development, agents are expected to acquire knowledge andskills through exploration and interaction with the environment. Despiteadvances, the community still lacks a unified, interactive reinforcementlearning (RL) framework that can effectively train such agents from scratch --without relying on supervised fine-tuning (SFT) -- across diverse and realisticenvironments. To bridge this gap, we introduce AgentGym-RL, a new framework totrain LLM agents for multi-turn interactive decision-making through RL. Theframework features a modular and decoupled architecture, ensuring highflexibility and extensibility. It encompasses a wide variety of real-worldscenarios, and supports mainstream RL algorithms. Furthermore, we proposeScalingInter-RL, a training approach designed for exploration-exploitationbalance and stable RL optimization. In early stages, it emphasizes exploitationby restricting the number of interactions, and gradually shifts towardsexploration with larger horizons to encourage diverse problem-solvingstrategies. In this way, the agent develops more diverse behaviors and is lessprone to collapse under long horizons. We perform extensive experiments tovalidate the stability and effectiveness of both the AgentGym-RL framework andthe ScalingInter-RL approach. Our agents match or surpass commercial models on27 tasks across diverse environments. We offer key insights and willopen-source the complete AgentGym-RL framework -- including code and datasets-- to empower the research community in developing the next generation ofintelligent agents.

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started

Hyper Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Command Palette

AgentGym-RL: Training LLM Agents for Long-Horizon Decision Making through Multi-Turn Reinforcement Learning

Zhiheng Xi Jixuan Huang Chenyang Liao Baodai Huang Honglin Guo Jiaqi Liu Rui Zheng Junjie Ye Jiazheng Zhang Wenxiang Chen13 more

Abstract

Build AI with AI

Hyper Newsletters

Zhiheng Xi Jixuan Huang Chenyang Liao Baodai Huang Honglin Guo Jiaqi Liu Rui Zheng Junjie Ye Jiazheng Zhang Wenxiang Chen