4 months ago

Value Prediction Network

Junhyuk Oh; Satinder Singh; Honglak Lee

Abstract

This paper proposes a novel deep reinforcement learning (RL) architecture, called Value Prediction Network (VPN), which integrates model-free and model-based RL methods into a single neural network. In contrast to typical model-based RL methods, VPN learns a dynamics model whose abstract states are trained to make option-conditional predictions of future values (discounted sum of rewards) rather than of future observations. Our experimental results show that VPN has several advantages over both model-free and model-based baselines in a stochastic environment where careful planning is required but building an accurate observation-prediction model is difficult. Furthermore, VPN outperforms Deep Q-Network (DQN) on several Atari games even with short-lookahead planning, demonstrating its potential as a new way of learning a good state representation.

Code Repositories

geohot/twitchcoq

Mentioned in GitHub

junhyukoh/value-prediction-network

Official

Mentioned in GitHub

Benchmarks

Benchmark	Methodology	Metrics
atari-games-on-atari-2600-alien	VPN	Score: 1429
atari-games-on-atari-2600-amidar	VPN	Score: 641
atari-games-on-atari-2600-crazy-climber	VPN	Score: 54119
atari-games-on-atari-2600-enduro	VPN	Score: 382
atari-games-on-atari-2600-frostbite	VPN	Score: 3811
atari-games-on-atari-2600-krull	VPN	Score: 15930
atari-games-on-atari-2600-ms-pacman	VPN	Score: 2689
atari-games-on-atari-2600-qbert	VPN	Score: 14517
atari-games-on-atari-2600-seaquest	VPN	Score: 5628

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started

Hyper Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Command Palette