Command Palette
Search for a command to run...
Junhyuk Oh; Satinder Singh; Honglak Lee

Abstract
This paper proposes a novel deep reinforcement learning (RL) architecture, called Value Prediction Network (VPN), which integrates model-free and model-based RL methods into a single neural network. In contrast to typical model-based RL methods, VPN learns a dynamics model whose abstract states are trained to make option-conditional predictions of future values (discounted sum of rewards) rather than of future observations. Our experimental results show that VPN has several advantages over both model-free and model-based baselines in a stochastic environment where careful planning is required but building an accurate observation-prediction model is difficult. Furthermore, VPN outperforms Deep Q-Network (DQN) on several Atari games even with short-lookahead planning, demonstrating its potential as a new way of learning a good state representation.
Code Repositories
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| atari-games-on-atari-2600-alien | VPN | Score: 1429 |
| atari-games-on-atari-2600-amidar | VPN | Score: 641 |
| atari-games-on-atari-2600-crazy-climber | VPN | Score: 54119 |
| atari-games-on-atari-2600-enduro | VPN | Score: 382 |
| atari-games-on-atari-2600-frostbite | VPN | Score: 3811 |
| atari-games-on-atari-2600-krull | VPN | Score: 15930 |
| atari-games-on-atari-2600-ms-pacman | VPN | Score: 2689 |
| atari-games-on-atari-2600-qbert | VPN | Score: 14517 |
| atari-games-on-atari-2600-seaquest | VPN | Score: 5628 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.