HyperAIHyperAI

Command Palette

Search for a command to run...

2 months ago

AgentGym-RL: Training LLM Agents for Long-Horizon Decision Making through Multi-Turn Reinforcement Learning

AgentGym-RL: Training LLM Agents for Long-Horizon Decision Making
  through Multi-Turn Reinforcement Learning

Abstract

Developing autonomous LLM agents capable of making a series of intelligentdecisions to solve complex, real-world tasks is a fast-evolving frontier. Likehuman cognitive development, agents are expected to acquire knowledge andskills through exploration and interaction with the environment. Despiteadvances, the community still lacks a unified, interactive reinforcementlearning (RL) framework that can effectively train such agents from scratch --without relying on supervised fine-tuning (SFT) -- across diverse and realisticenvironments. To bridge this gap, we introduce AgentGym-RL, a new framework totrain LLM agents for multi-turn interactive decision-making through RL. Theframework features a modular and decoupled architecture, ensuring highflexibility and extensibility. It encompasses a wide variety of real-worldscenarios, and supports mainstream RL algorithms. Furthermore, we proposeScalingInter-RL, a training approach designed for exploration-exploitationbalance and stable RL optimization. In early stages, it emphasizes exploitationby restricting the number of interactions, and gradually shifts towardsexploration with larger horizons to encourage diverse problem-solvingstrategies. In this way, the agent develops more diverse behaviors and is lessprone to collapse under long horizons. We perform extensive experiments tovalidate the stability and effectiveness of both the AgentGym-RL framework andthe ScalingInter-RL approach. Our agents match or surpass commercial models on27 tasks across diverse environments. We offer key insights and willopen-source the complete AgentGym-RL framework -- including code and datasets-- to empower the research community in developing the next generation ofintelligent agents.

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
AgentGym-RL: Training LLM Agents for Long-Horizon Decision Making through Multi-Turn Reinforcement Learning | Papers | HyperAI