5 months ago

Proximal Policy Optimization Algorithms

John Schulman; Filip Wolski; Prafulla Dhariwal; Alec Radford; Oleg Klimov

Abstract

We propose a new family of policy gradient methods for reinforcement learning, which alternate between sampling data through interaction with the environment, and optimizing a "surrogate" objective function using stochastic gradient ascent. Whereas standard policy gradient methods perform one gradient update per data sample, we propose a novel objective function that enables multiple epochs of minibatch updates. The new methods, which we call proximal policy optimization (PPO), have some of the benefits of trust region policy optimization (TRPO), but they are much simpler to implement, more general, and have better sample complexity (empirically). Our experiments test PPO on a collection of benchmark tasks, including simulated robotic locomotion and Atari game playing, and we show that PPO outperforms other online policy gradient methods, and overall strikes a favorable balance between sample complexity, simplicity, and wall-time.

Code Repositories

jfpettit/flare

pytorch

Mentioned in GitHub

sc2crazy/StarCrackRL

Mentioned in GitHub

intellisys-lab/stellaris-sc24

Mentioned in GitHub

gaetanserre/l2rpn-2022_ppo-baseline

Mentioned in GitHub

facebookresearch/Horizon

pytorch

Mentioned in GitHub

clwainwright/proximal_policy_optimization

Mentioned in GitHub

benevolentAI/guacamol_baselines

pytorch

Mentioned in GitHub

LuEE-C/PPO-Keras

Mentioned in GitHub

jsztompka/MultiAgent-PPO

pytorch

Mentioned in GitHub

nikhilbarhate99/PPO

pytorch

Mentioned in GitHub

MaximeVandegar/Papers-in-100-Lines-of-Code/tree/main/Proximal_Policy_Optimization_Algorithms

pytorch

170928/-Review-Proximal-Policy-Optimization-Algorithms

Mentioned in GitHub

JL321/Proximal-Policy-Optimization

Mentioned in GitHub

alexbaumi/PPO-Algorithms

pytorch

Mentioned in GitHub

nvlabs/gbrl_sb3

pytorch

Mentioned in GitHub

s-sd/task-amenability

Mentioned in GitHub

shuishida/soaprl

pytorch

Mentioned in GitHub

zjlab-ammi/llm4rl

pytorch

Mentioned in GitHub

tobiasemrich/SchafkopfRL

pytorch

Mentioned in GitHub

guofeng201507/ISA-IPA-2020-05-05-IS1PT-GRP-High5-StockTrading

Mentioned in GitHub

amartyamukherjee/ppo-packcooling

pytorch

Mentioned in GitHub

NACLab/robust-active-inference

jax

Mentioned in GitHub

bonniesjli/PPO-Reacher_UnityML

pytorch

Mentioned in GitHub

Nordeus/heroic-rl

Mentioned in GitHub

zsz-hst/RL_single_chase_point

Mentioned in GitHub

Gouet/Breakout-V0

Mentioned in GitHub

gwthomas/gtml

Mentioned in GitHub

Aravind-11/Multi-Agent-RL

pytorch

Mentioned in GitHub

amanda-lambda/hack-flappy-bird-drl

pytorch

Mentioned in GitHub

toni-sm/skrl

jax

ray-project/ray/tree/master/rllib

dickreuter/neuron_poker

Mentioned in GitHub

FMArduini/python-rl

Mentioned in GitHub

adamos581/ppo-keras-football

Mentioned in GitHub

jcwleo/curiosity-driven-exploration-pytorch

pytorch

Mentioned in GitHub

Aravind-11/IITM_Saastra

Mentioned in GitHub

montaserFath/Reinforcement-Learning-for-Prosthetics

Mentioned in GitHub

sirakik/mprg_fc

pytorch

Mentioned in GitHub

morikatron/PPO

Mentioned in GitHub

gstoica27/cpg_ppo

Mentioned in GitHub

MrSyee/pg-is-all-you-need

Mentioned in GitHub

Rafael1s/Deep-Reinforcement-Learning-Udacity

pytorch

Mentioned in GitHub

pytorch/rl/tree/main/examples/ppo

jax

tidedra/vl-rlhf

pytorch

Mentioned in GitHub

ASzot/ppo-pytorch

pytorch

Mentioned in GitHub

jsztompka/PPO-demo

pytorch

Mentioned in GitHub

alex-petrenko/sample-factory

pytorch

Mentioned in GitHub

bonniesjli/PPO_Reacher

pytorch

Mentioned in GitHub

alexmlamb/blocks_rl_gru_setup

pytorch

Mentioned in GitHub

seungjaeryanlee/osim-rl-helper

Mentioned in GitHub

tmjeong1103/RL_with_RAY

pytorch

Mentioned in GitHub

ifestus/rl

Mentioned in GitHub

dmiu-shell/deeprl-shell

pytorch

Mentioned in GitHub

andyljones/zonotable

Mentioned in GitHub

Khrylx/PyTorch-RL

pytorch

Mentioned in GitHub

MatteoBrentegani/PPO

Mentioned in GitHub

danelee2601/rl-based-automatic-berthing

Mentioned in GitHub

mindspore-courses/Deep-Reinforcement-Learning-Algorithms-with-MindSpore

mindspore

adik993/ppo-pytorch

pytorch

Mentioned in GitHub

DMIU-ShELL/MOSAIC

pytorch

Mentioned in GitHub

alecfilios/Training-Intelligent-game-Agents-through-Competitive-Reinforcement-Learning

pytorch

Mentioned in GitHub

nitthilan/pommerman

Mentioned in GitHub

amanda-lambda/drl-experiments

pytorch

Mentioned in GitHub

yoavalon/QuadcopterReinforcementLearning

Mentioned in GitHub

xiawenwen49/ppo

Mentioned in GitHub

siddharthverma314/proximalpolicyoptimization

pytorch

Mentioned in GitHub

SPark9625/PyTorch-Proximal-Policy-Optimization

pytorch

Mentioned in GitHub

Aravind-11/AI-Gaming

Mentioned in GitHub

hill-a/stable-baselines

gmoss1/Kaggle-Halite-IV-RL

Mentioned in GitHub

170928/-Review-Generative-Adversarial-Imitation-Learning

Mentioned in GitHub

silvialuu/DRL-2018

pytorch

Mentioned in GitHub

DavidCastilloAlvarado/Path-planning-and-Reinforcement-Learning

Mentioned in GitHub

lgerrets/rl18-curiosity

Mentioned in GitHub

downingbots/RLDonkeycar

Mentioned in GitHub

tcmxx/CNTKUnityTools

Mentioned in GitHub

BrianPulfer/PapersReimplementations

pytorch

Mentioned in GitHub

chainer/chainerrl

pytorch

Mentioned in GitHub

alirezakazemipour/ppo-rnd

pytorch

Mentioned in GitHub

BerkeleyLearnVerify/VerifAI

Mentioned in GitHub

evieq01/oodil

pytorch

Mentioned in GitHub

mit-realm/neuriss

pytorch

Mentioned in GitHub

dyabel/handson_rl

pytorch

openpsi-projects/srl

pytorch

Mentioned in GitHub

near32/regym

pytorch

Mentioned in GitHub

eladsar/rbi

pytorch

Mentioned in GitHub

nric/ProximalPolicyOptimizationContinuousKeras

Mentioned in GitHub

harruff/Senior_Project_Repository

Mentioned in GitHub

deconlabs/Binanace-trading-simulation

pytorch

Mentioned in GitHub

Zartris/TD3_continuous_control

pytorch

Mentioned in GitHub

EconomistGrant/HTFE-tensortrade

Mentioned in GitHub

mark-gluzman/NmodelPPO

Mentioned in GitHub

michael-snower/ppo

Mentioned in GitHub

CSautier/Breakout

pytorch

Mentioned in GitHub

wangshub/RL-Stock

Mentioned in GitHub

ikostrikov/pytorch-a2c-ppo-acktr-gail

pytorch

Mentioned in GitHub

SalvatoreCognetta/reasoning-agent-project

pytorch

Mentioned in GitHub

Ipsedo/EvoMotion

pytorch

ikostrikov/pytorch-rl

pytorch

Mentioned in GitHub

Gouet/Acrobot-PPO

Mentioned in GitHub

xiuyu0000/new_papers_codes/tree/main/ppo

mindspore

https://bitbucket.org/act-lab/release

Mentioned in GitHub

araffin/sbx

jax

Mentioned in GitHub

Ostyk/walk-bot

pytorch

Mentioned in GitHub

mightypirate1/PPO_homebrew

Mentioned in GitHub

fdcl-gwu/gym-rotor

pytorch

Mentioned in GitHub

ailab-pku/rl-framework

pytorch

Mentioned in GitHub

llSourcell/OpenAI_Five_vs_Dota2_Explained

pytorch

Mentioned in GitHub

liuyuezhang/pyrl

pytorch

Mentioned in GitHub

tensorlayer/RLzoo

Mentioned in GitHub

shreyesss/PPO-implementation-keras-tensorflow

Mentioned in GitHub

hmhuy0/SIM-RL

pytorch

Mentioned in GitHub

nikhilbarhate99/PPO-PyTorch

pytorch

Mentioned in GitHub

saschaschramm/Pong

Mentioned in GitHub

Stippler/cow-simulator

pytorch

Mentioned in GitHub

Gouet/PPO-gym

Mentioned in GitHub

Gouet/PPO-pytorch

pytorch

Mentioned in GitHub

decoderkurt/research_project_school_of_ai_2019

Mentioned in GitHub

Gregory-Eales/proximal-policy-optimization

pytorch

Mentioned in GitHub

GiadaSimionato/Reasoning_Agents_2020

pytorch

Mentioned in GitHub

hamishs/JAX-RL

jax

Mentioned in GitHub

Crevass/Hybrid-Agent

Mentioned in GitHub

yoavalon/Quadcopter-env

Mentioned in GitHub

rshnn/battleship

Mentioned in GitHub

jw1401/PPO-Tensorflow-2.0

Mentioned in GitHub

UesugiErii/tf2-PPO-atari

Mentioned in GitHub

jvidals09/Decentralized-and-multi-agent-control-of-Franka-Emika-Panda-robot-in-continuous-task-execution

pytorch

Mentioned in GitHub

miroblog/tf_deep_rl_trader

Mentioned in GitHub

sirakik/ppo_football

pytorch

Mentioned in GitHub

bay3s/ppo-parallel

pytorch

Mentioned in GitHub

BIT-aerial-robotics/AquaML/blob/2.1.11/AquaML/rlalgo/PPOAgent.py

2mawi2/master-thesis-experiments

Mentioned in GitHub

ku2482/gail-airl-ppo.pytorch

pytorch

Mentioned in GitHub

vheuthe/microbot_rl

Mentioned in GitHub

morikatron/GAIL_PPO

Mentioned in GitHub

Michaelrising/Prog-RL

pytorch

Mentioned in GitHub

lcswillems/torch-ac

pytorch

Mentioned in GitHub

synthlabsai/big-math

Mentioned in GitHub

automl/learna

Mentioned in GitHub

donamin/llc

Mentioned in GitHub

anthonysong98/super-mario-bros-ppo

pytorch

Mentioned in GitHub

arnomoonens/yarll

vermashresth/damage-aware-PPO

Mentioned in GitHub

InSpaceAI/RL-Zoo

Mentioned in GitHub

vcadillog/PPO-Mario-Bros-Tensorflow-2

Mentioned in GitHub

jhare96/reinforcement-learning

Mentioned in GitHub

JonasRSV/PPO

Mentioned in GitHub

takuseno/ppo

Mentioned in GitHub

reinforcement-learning-kr/pg_travel

pytorch

Mentioned in GitHub

bentrevett/pytorch-rl

pytorch

Mentioned in GitHub

hdparks/AsteroidsDeepReinforcement

pytorch

Mentioned in GitHub

amaudruz/RL_openaigym

pytorch

Mentioned in GitHub

labmlai/annotated_deep_learning_paper_implementations

pytorch

Narsil/rl-baselines

pytorch

Mentioned in GitHub

wangzhengfei0730/NIPS2018-AIforProsthetics

Mentioned in GitHub

OctopusMind/RLHF_PPO

pytorch

tuanpnm99/RLPongAgent

pytorch

Mentioned in GitHub

amzoyang/CS-221-Final-Project

pytorch

Mentioned in GitHub

OctThe16th/PPO-Keras

Mentioned in GitHub

alexbaumi/PPO-Algorithm

pytorch

Mentioned in GitHub

benpetit/cs379c

Mentioned in GitHub

liyiyuian/Deep-Learning

Mentioned in GitHub

deconlabs/Binanace_trading_simulation

pytorch

Mentioned in GitHub

inoryy/reaver

Mentioned in GitHub

deconlabs/TradingZoo-Dynamic-fee-simulation

pytorch

Mentioned in GitHub

jcwleo/random-network-distillation-pytorch

pytorch

Mentioned in GitHub

taku-y/20181125-pybullet

Mentioned in GitHub

cipher982/ppo-exploration

pytorch

Mentioned in GitHub

xtma/simple-pytorch-rl

pytorch

Mentioned in GitHub

NervanaSystems/coach

Mentioned in GitHub

facebookresearch/ReAgent

pytorch

Mentioned in GitHub

georgkruse/cleanqrl

pytorch

Mentioned in GitHub

uvipen/super-mario-bros-ppo-pytorch

pytorch

Mentioned in GitHub

goncharom/PPOv1

pytorch

Mentioned in GitHub

CSautier/PongBot

Mentioned in GitHub

microsoft/strategically_efficient_rl

Mentioned in GitHub

tatsu-lab/linguistic_calibration

pytorch

Mentioned in GitHub

marload/DeepRL-TensorFlow2

Mentioned in GitHub

emerge-lab/nocturne_lab

pytorch

Mentioned in GitHub

tilkb/thermoai

Mentioned in GitHub

shiningsunnyday/mcts-chess

pytorch

Mentioned in GitHub

juliusfrost/Research-Paper-Implementations

Mentioned in GitHub

DevSlem/AINE-DRL

pytorch

tommyvsfu1/RL-NTU

pytorch

Mentioned in GitHub

theresearchai/vehicle_routing_rl_2

pytorch

Mentioned in GitHub

jongornet14/HyperController

pytorch

Mentioned in GitHub

DLR-RM/stable-baselines3

pytorch

bay3s/ppo-rl

pytorch

Mentioned in GitHub

Benchmarks

Benchmark	Methodology	Metrics
continuous-control-on-lunar-lander-openai-gym	PPO	Score: 175.14±44.94
neural-architecture-search-on-nats-bench	PPO (Schulman et al., 2017)	Test Accuracy: 44.95
neural-architecture-search-on-nats-bench-1	PPO (Schulman et al., 2017)	Test Accuracy: 94.02
neural-architecture-search-on-nats-bench-2	PPO (Schulman et al., 2017)	Test Accuracy: 71.68
openai-gym-on-ant-v4	PPO	Average Return: 608.97
openai-gym-on-halfcheetah-v4	PPO	Average Return: 6006.11
openai-gym-on-hopper-v4	PPO	Average Return: 790.77
openai-gym-on-humanoid-v4	PPO	Average Return: 925.89
openai-gym-on-walker2d-v4	PPO	Average Return: 2739.81

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started

Hyper Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Command Palette

Proximal Policy Optimization Algorithms

John Schulman; Filip Wolski; Prafulla Dhariwal; Alec Radford; Oleg Klimov

Abstract

Code Repositories

Benchmarks

Build AI with AI

Hyper Newsletters