5 months ago

Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor

Tuomas Haarnoja; Aurick Zhou; Pieter Abbeel; Sergey Levine

Abstract

Model-free deep reinforcement learning (RL) algorithms have been demonstrated on a range of challenging decision making and control tasks. However, these methods typically suffer from two major challenges: very high sample complexity and brittle convergence properties, which necessitate meticulous hyperparameter tuning. Both of these challenges severely limit the applicability of such methods to complex, real-world domains. In this paper, we propose soft actor-critic, an off-policy actor-critic deep RL algorithm based on the maximum entropy reinforcement learning framework. In this framework, the actor aims to maximize expected reward while also maximizing entropy. That is, to succeed at the task while acting as randomly as possible. Prior deep RL methods based on this framework have been formulated as Q-learning methods. By combining off-policy updates with a stable stochastic actor-critic formulation, our method achieves state-of-the-art performance on a range of continuous control benchmark tasks, outperforming prior on-policy and off-policy methods. Furthermore, we demonstrate that, in contrast to other off-policy algorithms, our approach is very stable, achieving very similar performance across different random seeds.

Code Repositories

baturaysaglam/la3p

pytorch

Mentioned in GitHub

quantumiracle/Popular-RL-Algorithms

pytorch

Mentioned in GitHub

SaminYeasar/off_policy_ac

pytorch

Mentioned in GitHub

kairproject/kair_algorithms_draft

pytorch

Mentioned in GitHub

ku2482/rljax

jax

Mentioned in GitHub

ShawK91/erl_paper_nips18

pytorch

Mentioned in GitHub

toni-sm/skrl

jax

ray-project/ray/tree/master/rllib

Kaixhin/imitation-learning

pytorch

Mentioned in GitHub

dasgringuen/assetto_corsa_gym

pytorch

Mentioned in GitHub

watchernyu/spinningup-drl-prototyping

Mentioned in GitHub

kushagra06/SAC

pytorch

Mentioned in GitHub

timoklein/car_racer

pytorch

Mentioned in GitHub

MrSyee/pg-is-all-you-need

Mentioned in GitHub

Rafael1s/Deep-Reinforcement-Learning-Udacity

pytorch

Mentioned in GitHub

polixir/NeoRL

Mentioned in GitHub

core-robotics-lab/icct

pytorch

Mentioned in GitHub

tmjeong1103/RL_with_RAY

pytorch

Mentioned in GitHub

flowersteam/rl_stats

Mentioned in GitHub

thomashirtz/pytorch-soft-actor-critic

pytorch

Mentioned in GitHub

hill-a/stable-baselines

MatthieuSarkis/Portfolio-Optimization-and-Goal-Based-Investment-with-Reinforcement-Learning

pytorch

Mentioned in GitHub

gijskoning/ReproducingCURL

pytorch

Mentioned in GitHub

ShawK91/Evolutionary-Reinforcement-Learning

pytorch

Mentioned in GitHub

BY571/Soft-Actor-Critic-and-Extensions

pytorch

AutumnWu/Streamlined-Off-Policy-Learning

Mentioned in GitHub

AmmarFayad/Behavioral-Actor-Critic

pytorch

Mentioned in GitHub

ajaysub110/rl-pytorch

pytorch

Mentioned in GitHub

dfki-ric-underactuated-lab/torque_limited_simple_pendulum

Mentioned in GitHub

ac-93/soft-actor-critic

Mentioned in GitHub

Steinheilig/Imbiss

Mentioned in GitHub

kdally/fault-tolerant-flight-control-drl

Mentioned in GitHub

Ipsedo/EvoMotion

pytorch

araffin/sbx

jax

Mentioned in GitHub

autumnwu/aggressive-q-learning-with-ensembles

Mentioned in GitHub

opendilab/DI-engine/blob/main/ding/policy/sac.py

pytorch

fdcl-gwu/gym-rotor

pytorch

Mentioned in GitHub

lollcat/Soft-Actor-Critic

Mentioned in GitHub

tarod13/SAC

pytorch

Mentioned in GitHub

tensorlayer/RLzoo

hyunin-lee/ForecasterSAC

pytorch

Mentioned in GitHub

ku2482/soft-actor-critic.pytorch

pytorch

Mentioned in GitHub

rk1998/robot-sac

Mentioned in GitHub

lanqingli1993/focal-iclr

pytorch

Mentioned in GitHub

jakegrigsby/deep_control/blob/master/deep_control/sac.py

pytorch

haarnoja/sac

Official

Mentioned in GitHub

nagisazj/idaq_public

Mentioned in GitHub

seungju-k1m/sac-td3-td7

pytorch

ikostrikov/jax-rl

jax

Mentioned in GitHub

ku2482/gail-airl-ppo.pytorch

pytorch

Mentioned in GitHub

pytorch/rl/tree/main/examples/sac

jax

X3N4/car_racer

pytorch

Mentioned in GitHub

toshikwa/discor.pytorch

pytorch

Mentioned in GitHub

RLAgent/state-marginal-matching

pytorch

Mentioned in GitHub

donamin/llc

Mentioned in GitHub

pranz24/pytorch-soft-actor-critic

pytorch

Mentioned in GitHub

cindycia/Atari-SAC-Discrete

pytorch

Mentioned in GitHub

sunfex/weighted-sac

pytorch

Mentioned in GitHub

andrejorsula/drl_grasping

pytorch

Mentioned in GitHub

FOCAL-ICLR/FOCAL-ICLR

pytorch

Mentioned in GitHub

xiuyu0000/new_papers_codes/tree/main/sac

mindspore

roythuly/obac

pytorch

Mentioned in GitHub

learn-to-race/l2r

Mentioned in GitHub

garyzyr001/rethinking-airl

pytorch

Mentioned in GitHub

lucadellalib/sac-beta

pytorch

Mentioned in GitHub

QuentinVacher-rl/SoftActorCritic-in-Cpp-using-LibTorch

pytorch

Mentioned in GitHub

yining043/SAC-discrete

Mentioned in GitHub

ku2482/discor.pytorch

pytorch

Mentioned in GitHub

facebookresearch/ReAgent

pytorch

Mentioned in GitHub

toshikwa/soft-actor-critic.pytorch

pytorch

Mentioned in GitHub

h-aboutalebi/SparceReward

pytorch

Mentioned in GitHub

tliu1997/rnac

pytorch

Mentioned in GitHub

trackmania-rl/tmrl

pytorch

Mentioned in GitHub

mxblr/DeepRLHockey

Mentioned in GitHub

moreanp/csro

pytorch

Mentioned in GitHub

marload/DeepRL-TensorFlow2

Mentioned in GitHub

ku2482/rltorch

pytorch

Mentioned in GitHub

thomashirtz/soft-actor-critic

pytorch

Mentioned in GitHub

tilkb/thermoai

Mentioned in GitHub

yhisaki/average-reward-drl

pytorch

Mentioned in GitHub

ccolas/rl_stats

Mentioned in GitHub

DLR-RM/stable-baselines3

pytorch

yimingpeng/sac-master

Mentioned in GitHub

MarsEleven/car_racer_RL

pytorch

Mentioned in GitHub

susan-amin/SparseBaseline1

pytorch

Mentioned in GitHub

facebookresearch/rl/blob/main/examples/sac/sac.py

jax

Benchmarks

Benchmark	Methodology	Metrics
continuous-control-on-lunar-lander-openai-gym	SAC	Score: 284.59±0.97
omniverse-isaac-gym-on-allegrohand	SAC	Average Return: 296.49
omniverse-isaac-gym-on-ant	SAC	Average Return: 7717.93
omniverse-isaac-gym-on-anymal	SAC	Average Return: 11.87
omniverse-isaac-gym-on-frankacabinet	SAC	Average Return: 1721.98
omniverse-isaac-gym-on-humanoid	SAC	Average Return: 4028.31
omniverse-isaac-gym-on-ingenuity	SAC	Average Return: 5301.99
openai-gym-on-ant-v4	SAC	Average Return: 5208.09
openai-gym-on-halfcheetah-v4	SAC	Average Return: 15836.04
openai-gym-on-hopper-v4	SAC	Average Return: 2882.56
openai-gym-on-humanoid-v4	SAC	Average Return: 6211.50
openai-gym-on-walker2d-v4	SAC	Average Return: 5745.27

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started

Hyper Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Command Palette

Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor

Tuomas Haarnoja; Aurick Zhou; Pieter Abbeel; Sergey Levine

Abstract

Code Repositories

Benchmarks

Build AI with AI

Hyper Newsletters