HyperAIHyperAI

Command Palette

Search for a command to run...

a month ago

GEM: A Gym for Agentic LLMs

GEM: A Gym for Agentic LLMs

Abstract

The training paradigm for large language models (LLMs) is moving from staticdatasets to experience-based learning, where agents acquire skills viainteracting with complex environments. To facilitate this transition weintroduce GEM (General Experience Maker), an open-source environment simulatordesigned for the age of LLMs. Analogous to OpenAI-Gym for traditionalreinforcement learning (RL), GEM provides a standardized framework for theenvironment-agent interface, including asynchronous vectorized execution forhigh throughput, and flexible wrappers for easy extensibility. GEM alsofeatures a diverse suite of environments, robust integrated tools, andsingle-file example scripts demonstrating using GEM with five popular RLtraining frameworks. Along with this, we also provide a set of baselines across24 environments using REINFORCE with Return Batch Normalization (ReBN), which-- unlike GRPO -- is compatible with the full RL setting of dense per-turnrewards and offers better credit assignment. We further conduct apple-to-applebenchmarking of PPO, GRPO and REINFORCE in both single- and multi-turn settingsusing GEM to shed light on the algorithmic designs. Lastly, GEM also functionsas a convenient evaluation toolkit besides a training environment. We hope thisframework can help accelerate future agentic LLM research.

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
GEM: A Gym for Agentic LLMs | Papers | HyperAI