4 months ago

Addressing Function Approximation Error in Actor-Critic Methods

Scott Fujimoto; Herke van Hoof; David Meger

Abstract

In value-based reinforcement learning methods such as deep Q-learning, function approximation errors are known to lead to overestimated value estimates and suboptimal policies. We show that this problem persists in an actor-critic setting and propose novel mechanisms to minimize its effects on both the actor and the critic. Our algorithm builds on Double Q-learning, by taking the minimum value between a pair of critics to limit overestimation. We draw the connection between target networks and overestimation bias, and suggest delaying policy updates to reduce per-update error and further improve performance. We evaluate our method on the suite of OpenAI gym tasks, outperforming the state of the art in every environment tested.

Code Repositories

arrival-ltd/catalyst-rl-tutorial

pytorch

Mentioned in GitHub

baturaysaglam/la3p

pytorch

Mentioned in GitHub

quantumiracle/Popular-RL-Algorithms

pytorch

Mentioned in GitHub

CharlotteMorrison/Baxter-Research

pytorch

Mentioned in GitHub

jyqhahah/rl_maddpg_matd3

pytorch

Mentioned in GitHub

SaminYeasar/off_policy_ac

pytorch

Mentioned in GitHub

opendilab/DI-engine/blob/main/ding/policy/td3.py

pytorch

kairproject/kair_algorithms_draft

pytorch

Mentioned in GitHub

gwthomas/gtml

Mentioned in GitHub

toni-sm/skrl

jax

markub3327/rl-agent

Mentioned in GitHub

VasaKiDD/TD3-deep-rl-research

pytorch

Mentioned in GitHub

MrSyee/pg-is-all-you-need

Mentioned in GitHub

Rafael1s/Deep-Reinforcement-Learning-Udacity

pytorch

Mentioned in GitHub

core-robotics-lab/icct

pytorch

Mentioned in GitHub

b06b01073/Twin-Delayed-DDPG

pytorch

Mentioned in GitHub

flowersteam/rl_stats

Mentioned in GitHub

DanielTakeshi/DCUR

pytorch

Mentioned in GitHub

mindspore-courses/Deep-Reinforcement-Learning-Algorithms-with-MindSpore

mindspore

Mentioned in GitHub

ollenilsson19/MAP-Elites-GAPG

pytorch

Mentioned in GitHub

ollenilsson19/PGA-MAP-Elites

pytorch

Mentioned in GitHub

hill-a/stable-baselines

yydsok/oparl

pytorch

Mentioned in GitHub

chainer/chainerrl

pytorch

Mentioned in GitHub

sfujim/TD3

Official

pytorch

Mentioned in GitHub

tomaszsmaruj25/Twin-Delayed-DDPG-Implementation

Mentioned in GitHub

fiorenza2/OffCon3

pytorch

Mentioned in GitHub

jakegrigsby/deep_control/blob/master/deep_control/td3.py

pytorch

Zartris/TD3_continuous_control

pytorch

Mentioned in GitHub

BIT-aerial-robotics/AquaML/blob/2.1.11/AquaML/rlalgo/TD3Agent.py

araffin/sbx

jax

Mentioned in GitHub

nikhilbarhate99/TD3-PyTorch-BipedalWalker-v2

pytorch

Mentioned in GitHub

fdcl-gwu/gym-rotor

pytorch

Mentioned in GitHub

JohannesAck/MATD3implementation

Mentioned in GitHub

GhadaSokar/Dynamic-Sparse-Training-for-Deep-Reinforcement-Learning

pytorch

Mentioned in GitHub

robintyh1/icml2021-pengqlambda

Mentioned in GitHub

tensorlayer/RLzoo

Mentioned in GitHub

coreylowman/rl_simply

pytorch

Mentioned in GitHub

georgesung/TD3

pytorch

Mentioned in GitHub

crazyleg/TD3-reacher

pytorch

Mentioned in GitHub

CharlotteMorrison/Baxter-VREP

pytorch

Mentioned in GitHub

rshnn/battleship

Mentioned in GitHub

seungju-k1m/sac-td3-td7

pytorch

pytorch/rl/tree/main/examples/td3

jax

patrickhart/jaxdl

jax

Mentioned in GitHub

GauravPatel89/Car-Navigation-Simulation-using-TD3

pytorch

Mentioned in GitHub

soumik12345/Twin-Delayed-DDPG

pytorch

Mentioned in GitHub

markub3327/rl-toolkit

Mentioned in GitHub

andrejorsula/drl_grasping

pytorch

Mentioned in GitHub

intelligent-environments-lab/CityLearn

Mentioned in GitHub

CharlotteMorrison/Baxter-VREP-Version-2

pytorch

Mentioned in GitHub

andreidi/AC_DDPG_walker

Mentioned in GitHub

jaem-seo/AI_tokamak_control

Mentioned in GitHub

ashaaher/Reinforcement-Learning-Project

pytorch

Mentioned in GitHub

facebookresearch/ReAgent

pytorch

Mentioned in GitHub

SeungeonBaek/continuous-agents-test

Mentioned in GitHub

claudeHifly/BipedalWalker-v3

pytorch

Mentioned in GitHub

baturaysaglam/dase

pytorch

Mentioned in GitHub

adaptive-intelligent-robotics/pga-map-elites

pytorch

Mentioned in GitHub

reiniscimurs/TD3_Separate_Action

pytorch

Mentioned in GitHub

baturaysaglam/ac-off-poc

pytorch

Mentioned in GitHub

yifan12wu/td3-jax

jax

Mentioned in GitHub

marload/DeepRL-TensorFlow2

Mentioned in GitHub

markub3327/rl-baselines

Mentioned in GitHub

ccolas/rl_stats

Mentioned in GitHub

DLR-RM/stable-baselines3

pytorch

pkasala/ContinuesControl

pytorch

Mentioned in GitHub

Benchmarks

Benchmark	Methodology	Metrics
continuous-control-on-lunar-lander-openai-gym	TD3	Score: 277.26±4.17
openai-gym-on-ant-v4	TD3	Average Return: 5942.55
openai-gym-on-halfcheetah-v4	TD3	Average Return: 12026.73
openai-gym-on-hopper-v4	TD3	Average Return: 3319.98
openai-gym-on-humanoid-v4	TD3	Average Return: 198.44
openai-gym-on-walker2d-v4	TD3	Average Return: 2612.74

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started

Hyper Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Command Palette

Addressing Function Approximation Error in Actor-Critic Methods

Scott Fujimoto; Herke van Hoof; David Meger

Abstract

Code Repositories

Benchmarks

Build AI with AI

Hyper Newsletters