Command Palette
Search for a command to run...

Abstract
The training paradigm for large language models (LLMs) is moving from staticdatasets to experience-based learning, where agents acquire skills viainteracting with complex environments. To facilitate this transition weintroduce GEM (General Experience Maker), an open-source environment simulatordesigned for the age of LLMs. Analogous to OpenAI-Gym for traditionalreinforcement learning (RL), GEM provides a standardized framework for theenvironment-agent interface, including asynchronous vectorized execution forhigh throughput, and flexible wrappers for easy extensibility. GEM alsofeatures a diverse suite of environments, robust integrated tools, andsingle-file example scripts demonstrating using GEM with five popular RLtraining frameworks. Along with this, we also provide a set of baselines across24 environments using REINFORCE with Return Batch Normalization (ReBN), which-- unlike GRPO -- is compatible with the full RL setting of dense per-turnrewards and offers better credit assignment. We further conduct apple-to-applebenchmarking of PPO, GRPO and REINFORCE in both single- and multi-turn settingsusing GEM to shed light on the algorithmic designs. Lastly, GEM also functionsas a convenient evaluation toolkit besides a training environment. We hope thisframework can help accelerate future agentic LLM research.
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.