Command Palette
Search for a command to run...
AgentGym-RL: Training LLM Agents for Long-Horizon Decision Making through Multi-Turn Reinforcement Learning

Abstract
Developing autonomous LLM agents capable of making a series of intelligentdecisions to solve complex, real-world tasks is a fast-evolving frontier. Likehuman cognitive development, agents are expected to acquire knowledge andskills through exploration and interaction with the environment. Despiteadvances, the community still lacks a unified, interactive reinforcementlearning (RL) framework that can effectively train such agents from scratch --without relying on supervised fine-tuning (SFT) -- across diverse and realisticenvironments. To bridge this gap, we introduce AgentGym-RL, a new framework totrain LLM agents for multi-turn interactive decision-making through RL. Theframework features a modular and decoupled architecture, ensuring highflexibility and extensibility. It encompasses a wide variety of real-worldscenarios, and supports mainstream RL algorithms. Furthermore, we proposeScalingInter-RL, a training approach designed for exploration-exploitationbalance and stable RL optimization. In early stages, it emphasizes exploitationby restricting the number of interactions, and gradually shifts towardsexploration with larger horizons to encourage diverse problem-solvingstrategies. In this way, the agent develops more diverse behaviors and is lessprone to collapse under long horizons. We perform extensive experiments tovalidate the stability and effectiveness of both the AgentGym-RL framework andthe ScalingInter-RL approach. Our agents match or surpass commercial models on27 tasks across diverse environments. We offer key insights and willopen-source the complete AgentGym-RL framework -- including code and datasets-- to empower the research community in developing the next generation ofintelligent agents.
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.