Command Palette
Search for a command to run...
Addressing Function Approximation Error in Actor-Critic Methods
Scott Fujimoto; Herke van Hoof; David Meger

Abstract
In value-based reinforcement learning methods such as deep Q-learning, function approximation errors are known to lead to overestimated value estimates and suboptimal policies. We show that this problem persists in an actor-critic setting and propose novel mechanisms to minimize its effects on both the actor and the critic. Our algorithm builds on Double Q-learning, by taking the minimum value between a pair of critics to limit overestimation. We draw the connection between target networks and overestimation bias, and suggest delaying policy updates to reduce per-update error and further improve performance. We evaluate our method on the suite of OpenAI gym tasks, outperforming the state of the art in every environment tested.
Code Repositories
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| continuous-control-on-lunar-lander-openai-gym | TD3 | Score: 277.26±4.17 |
| openai-gym-on-ant-v4 | TD3 | Average Return: 5942.55 |
| openai-gym-on-halfcheetah-v4 | TD3 | Average Return: 12026.73 |
| openai-gym-on-hopper-v4 | TD3 | Average Return: 3319.98 |
| openai-gym-on-humanoid-v4 | TD3 | Average Return: 198.44 |
| openai-gym-on-walker2d-v4 | TD3 | Average Return: 2612.74 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.