Command Palette
Search for a command to run...
Timothy P. Lillicrap; Jonathan J. Hunt; Alexander Pritzel; Nicolas Heess; Tom Erez; Yuval Tassa; David Silver; Daan Wierstra

Abstract
We adapt the ideas underlying the success of Deep Q-Learning to the continuous action domain. We present an actor-critic, model-free algorithm based on the deterministic policy gradient that can operate over continuous action spaces. Using the same learning algorithm, network architecture and hyper-parameters, our algorithm robustly solves more than 20 simulated physics tasks, including classic problems such as cartpole swing-up, dexterous manipulation, legged locomotion and car driving. Our algorithm is able to find policies whose performance is competitive with those found by a planning algorithm with full access to the dynamics of the domain and its derivatives. We further demonstrate that for many of the tasks the algorithm can learn policies end-to-end: directly from raw pixel inputs.
Code Repositories
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| continuous-control-on-lunar-lander-openai-gym | DDPG | Score: 256.98±14.38 |
| openai-gym-on-ant-v4 | DDPG | Average Return: 1712.12 |
| openai-gym-on-halfcheetah-v4 | DDPG | Average Return: 14934.86 |
| openai-gym-on-hopper-v4 | DDPG | Average Return: 1290.24 |
| openai-gym-on-humanoid-v4 | DDPG | Average Return: 139.14 |
| openai-gym-on-walker2d-v4 | DDPG | Average Return: 2994.54 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.