Command Palette
Search for a command to run...
Marc G. Bellemare; Sriram Srinivasan; Georg Ostrovski; Tom Schaul; David Saxton; Remi Munos

Abstract
We consider an agent's uncertainty about its environment and the problem of generalizing this uncertainty across observations. Specifically, we focus on the problem of exploration in non-tabular reinforcement learning. Drawing inspiration from the intrinsic motivation literature, we use density models to measure uncertainty, and propose a novel algorithm for deriving a pseudo-count from an arbitrary density model. This technique enables us to generalize count-based exploration algorithms to the non-tabular case. We apply our ideas to Atari 2600 games, providing sensible pseudo-counts from raw pixels. We transform these pseudo-counts into intrinsic rewards and obtain significantly improved exploration in a number of hard games, including the infamously difficult Montezuma's Revenge.
Code Repositories
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| atari-games-on-atari-2600-freeway | A3C-CTS | Score: 30.48 |
| atari-games-on-atari-2600-gravitar | A3C-CTS | Score: 238.68 |
| atari-games-on-atari-2600-montezumas-revenge | DDQN-PC | Score: 3459 |
| atari-games-on-atari-2600-montezumas-revenge | A3C-CTS | Score: 273.7 |
| atari-games-on-atari-2600-private-eye | A3C-CTS | Score: 99.32 |
| atari-games-on-atari-2600-venture | A3C-CTS | Score: 0.0 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.