HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

Off-Policy Actor-Critic with Shared Experience Replay

Simon Schmitt Matteo Hessel Karen Simonyan

Off-Policy Actor-Critic with Shared Experience Replay

Abstract

We investigate the combination of actor-critic reinforcement learning algorithms with uniform large-scale experience replay and propose solutions for two challenges: (a) efficient actor-critic learning with experience replay (b) stability of off-policy learning where agents learn from other agents behaviour. We employ those insights to accelerate hyper-parameter sweeps in which all participating agents run concurrently and share their experience via a common replay module. To this end we analyze the bias-variance tradeoffs in V-trace, a form of importance sampling for actor-critic methods. Based on our analysis, we then argue for mixing experience sampled from replay with on-policy experience, and propose a new trust region scheme that scales effectively to data distributions where V-trace becomes unstable. We provide extensive empirical validation of the proposed solution. We further show the benefits of this setup by demonstrating state-of-the-art data efficiency on Atari among agents trained up until 200M environment frames.

Benchmarks

BenchmarkMethodologyMetrics
atari-games-on-atari-57LASER
Human World Record Breakthrough: 7
Mean Human Normalized Score: 1741.36%
atari-games-on-atari-gamesLASER
Mean Human Normalized Score: 1741.36%

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Off-Policy Actor-Critic with Shared Experience Replay | Papers | HyperAI