4 months ago

Zichen Liu Anya Sims Keyu Duan Changyu Chen Simon Yu Xiangxin Zhou Haotian Xu Shaopan Xiong Bo Liu Chenmien Tan

Abstract

The training paradigm for large language models (LLMs) is moving from staticdatasets to experience-based learning, where agents acquire skills viainteracting with complex environments. To facilitate this transition weintroduce GEM (General Experience Maker), an open-source environment simulatordesigned for the age of LLMs. Analogous to OpenAI-Gym for traditionalreinforcement learning (RL), GEM provides a standardized framework for theenvironment-agent interface, including asynchronous vectorized execution forhigh throughput, and flexible wrappers for easy extensibility. GEM alsofeatures a diverse suite of environments, robust integrated tools, andsingle-file example scripts demonstrating using GEM with five popular RLtraining frameworks. Along with this, we also provide a set of baselines across24 environments using REINFORCE with Return Batch Normalization (ReBN), which-- unlike GRPO -- is compatible with the full RL setting of dense per-turnrewards and offers better credit assignment. We further conduct apple-to-applebenchmarking of PPO, GRPO and REINFORCE in both single- and multi-turn settingsusing GEM to shed light on the algorithmic designs. Lastly, GEM also functionsas a convenient evaluation toolkit besides a training environment. We hope thisframework can help accelerate future agentic LLM research.

Source PDF View Code

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

HyperAI

4 months ago

LLM

Reinforcement Learning

Benchmarks

AI Infra

Method/Architecture

Zichen Liu Anya Sims Keyu Duan Changyu Chen Simon Yu Xiangxin Zhou Haotian Xu Shaopan Xiong Bo Liu Chenmien Tan

Abstract

Source PDF View Code

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

HyperAI

4 months ago

LLM

Reinforcement Learning

Benchmarks

AI Infra

Method/Architecture

Zichen Liu Anya Sims Keyu Duan Changyu Chen Simon Yu Xiangxin Zhou Haotian Xu Shaopan Xiong Bo Liu Chenmien Tan

Abstract

Source PDF View Code

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Command Palette

GEM: A Gym for Agentic LLMs

Zichen Liu Anya Sims Keyu Duan Changyu Chen Simon Yu Xiangxin Zhou Haotian Xu Shaopan Xiong Bo Liu Chenmien Tan9 more

Abstract

Build AI with AI

HyperAI Newsletters

Command Palette

GEM: A Gym for Agentic LLMs

Zichen Liu Anya Sims Keyu Duan Changyu Chen Simon Yu Xiangxin Zhou Haotian Xu Shaopan Xiong Bo Liu Chenmien Tan9 more

Abstract

Build AI with AI

HyperAI Newsletters

Command Palette

GEM: A Gym for Agentic LLMs

Zichen Liu Anya Sims Keyu Duan Changyu Chen Simon Yu Xiangxin Zhou Haotian Xu Shaopan Xiong Bo Liu Chenmien Tan9 more

Abstract

Build AI with AI

HyperAI Newsletters

Zichen Liu Anya Sims Keyu Duan Changyu Chen Simon Yu Xiangxin Zhou Haotian Xu Shaopan Xiong Bo Liu Chenmien Tan

Zichen Liu Anya Sims Keyu Duan Changyu Chen Simon Yu Xiangxin Zhou Haotian Xu Shaopan Xiong Bo Liu Chenmien Tan

Zichen Liu Anya Sims Keyu Duan Changyu Chen Simon Yu Xiangxin Zhou Haotian Xu Shaopan Xiong Bo Liu Chenmien Tan