HyperAIHyperAI

Command Palette

Search for a command to run...

5 months ago

Proximal Policy Optimization Algorithms

John Schulman; Filip Wolski; Prafulla Dhariwal; Alec Radford; Oleg Klimov

Proximal Policy Optimization Algorithms

Abstract

We propose a new family of policy gradient methods for reinforcement learning, which alternate between sampling data through interaction with the environment, and optimizing a "surrogate" objective function using stochastic gradient ascent. Whereas standard policy gradient methods perform one gradient update per data sample, we propose a novel objective function that enables multiple epochs of minibatch updates. The new methods, which we call proximal policy optimization (PPO), have some of the benefits of trust region policy optimization (TRPO), but they are much simpler to implement, more general, and have better sample complexity (empirically). Our experiments test PPO on a collection of benchmark tasks, including simulated robotic locomotion and Atari game playing, and we show that PPO outperforms other online policy gradient methods, and overall strikes a favorable balance between sample complexity, simplicity, and wall-time.

Code Repositories

jfpettit/flare
pytorch
Mentioned in GitHub
sc2crazy/StarCrackRL
tf
Mentioned in GitHub
facebookresearch/Horizon
pytorch
Mentioned in GitHub
benevolentAI/guacamol_baselines
pytorch
Mentioned in GitHub
LuEE-C/PPO-Keras
Mentioned in GitHub
jsztompka/MultiAgent-PPO
pytorch
Mentioned in GitHub
nikhilbarhate99/PPO
pytorch
Mentioned in GitHub
alexbaumi/PPO-Algorithms
pytorch
Mentioned in GitHub
nvlabs/gbrl_sb3
pytorch
Mentioned in GitHub
s-sd/task-amenability
tf
Mentioned in GitHub
shuishida/soaprl
pytorch
Mentioned in GitHub
zjlab-ammi/llm4rl
pytorch
Mentioned in GitHub
tobiasemrich/SchafkopfRL
pytorch
Mentioned in GitHub
amartyamukherjee/ppo-packcooling
pytorch
Mentioned in GitHub
NACLab/robust-active-inference
jax
Mentioned in GitHub
bonniesjli/PPO-Reacher_UnityML
pytorch
Mentioned in GitHub
Nordeus/heroic-rl
tf
Mentioned in GitHub
Gouet/Breakout-V0
tf
Mentioned in GitHub
gwthomas/gtml
tf
Mentioned in GitHub
Aravind-11/Multi-Agent-RL
pytorch
Mentioned in GitHub
amanda-lambda/hack-flappy-bird-drl
pytorch
Mentioned in GitHub
dickreuter/neuron_poker
Mentioned in GitHub
FMArduini/python-rl
tf
Mentioned in GitHub
adamos581/ppo-keras-football
Mentioned in GitHub
Aravind-11/IITM_Saastra
Mentioned in GitHub
sirakik/mprg_fc
pytorch
Mentioned in GitHub
morikatron/PPO
tf
Mentioned in GitHub
gstoica27/cpg_ppo
tf
Mentioned in GitHub
MrSyee/pg-is-all-you-need
Mentioned in GitHub
tidedra/vl-rlhf
pytorch
Mentioned in GitHub
ASzot/ppo-pytorch
pytorch
Mentioned in GitHub
jsztompka/PPO-demo
pytorch
Mentioned in GitHub
alex-petrenko/sample-factory
pytorch
Mentioned in GitHub
bonniesjli/PPO_Reacher
pytorch
Mentioned in GitHub
alexmlamb/blocks_rl_gru_setup
pytorch
Mentioned in GitHub
tmjeong1103/RL_with_RAY
pytorch
Mentioned in GitHub
ifestus/rl
tf
Mentioned in GitHub
dmiu-shell/deeprl-shell
pytorch
Mentioned in GitHub
andyljones/zonotable
Mentioned in GitHub
Khrylx/PyTorch-RL
pytorch
Mentioned in GitHub
MatteoBrentegani/PPO
tf
Mentioned in GitHub
adik993/ppo-pytorch
pytorch
Mentioned in GitHub
DMIU-ShELL/MOSAIC
pytorch
Mentioned in GitHub
nitthilan/pommerman
Mentioned in GitHub
amanda-lambda/drl-experiments
pytorch
Mentioned in GitHub
xiawenwen49/ppo
tf
Mentioned in GitHub
Aravind-11/AI-Gaming
tf
Mentioned in GitHub
gmoss1/Kaggle-Halite-IV-RL
Mentioned in GitHub
silvialuu/DRL-2018
pytorch
Mentioned in GitHub
lgerrets/rl18-curiosity
Mentioned in GitHub
downingbots/RLDonkeycar
Mentioned in GitHub
tcmxx/CNTKUnityTools
Mentioned in GitHub
BrianPulfer/PapersReimplementations
pytorch
Mentioned in GitHub
chainer/chainerrl
pytorch
Mentioned in GitHub
alirezakazemipour/ppo-rnd
pytorch
Mentioned in GitHub
BerkeleyLearnVerify/VerifAI
tf
Mentioned in GitHub
evieq01/oodil
pytorch
Mentioned in GitHub
mit-realm/neuriss
pytorch
Mentioned in GitHub
openpsi-projects/srl
pytorch
Mentioned in GitHub
near32/regym
pytorch
Mentioned in GitHub
eladsar/rbi
pytorch
Mentioned in GitHub
deconlabs/Binanace-trading-simulation
pytorch
Mentioned in GitHub
Zartris/TD3_continuous_control
pytorch
Mentioned in GitHub
EconomistGrant/HTFE-tensortrade
tf
Mentioned in GitHub
mark-gluzman/NmodelPPO
Mentioned in GitHub
michael-snower/ppo
tf
Mentioned in GitHub
CSautier/Breakout
pytorch
Mentioned in GitHub
wangshub/RL-Stock
Mentioned in GitHub
ikostrikov/pytorch-a2c-ppo-acktr-gail
pytorch
Mentioned in GitHub
ikostrikov/pytorch-rl
pytorch
Mentioned in GitHub
Gouet/Acrobot-PPO
tf
Mentioned in GitHub
araffin/sbx
jax
Mentioned in GitHub
Ostyk/walk-bot
pytorch
Mentioned in GitHub
mightypirate1/PPO_homebrew
tf
Mentioned in GitHub
fdcl-gwu/gym-rotor
pytorch
Mentioned in GitHub
ailab-pku/rl-framework
pytorch
Mentioned in GitHub
liuyuezhang/pyrl
pytorch
Mentioned in GitHub
tensorlayer/RLzoo
tf
Mentioned in GitHub
hmhuy0/SIM-RL
pytorch
Mentioned in GitHub
nikhilbarhate99/PPO-PyTorch
pytorch
Mentioned in GitHub
saschaschramm/Pong
tf
Mentioned in GitHub
Stippler/cow-simulator
pytorch
Mentioned in GitHub
Gouet/PPO-gym
tf
Mentioned in GitHub
Gouet/PPO-pytorch
pytorch
Mentioned in GitHub
GiadaSimionato/Reasoning_Agents_2020
pytorch
Mentioned in GitHub
hamishs/JAX-RL
jax
Mentioned in GitHub
Crevass/Hybrid-Agent
tf
Mentioned in GitHub
yoavalon/Quadcopter-env
tf
Mentioned in GitHub
rshnn/battleship
Mentioned in GitHub
jw1401/PPO-Tensorflow-2.0
tf
Mentioned in GitHub
UesugiErii/tf2-PPO-atari
tf
Mentioned in GitHub
miroblog/tf_deep_rl_trader
tf
Mentioned in GitHub
sirakik/ppo_football
pytorch
Mentioned in GitHub
bay3s/ppo-parallel
pytorch
Mentioned in GitHub
ku2482/gail-airl-ppo.pytorch
pytorch
Mentioned in GitHub
vheuthe/microbot_rl
Mentioned in GitHub
morikatron/GAIL_PPO
tf
Mentioned in GitHub
Michaelrising/Prog-RL
pytorch
Mentioned in GitHub
lcswillems/torch-ac
pytorch
Mentioned in GitHub
synthlabsai/big-math
Mentioned in GitHub
automl/learna
tf
Mentioned in GitHub
donamin/llc
tf
Mentioned in GitHub
anthonysong98/super-mario-bros-ppo
pytorch
Mentioned in GitHub
vermashresth/damage-aware-PPO
tf
Mentioned in GitHub
InSpaceAI/RL-Zoo
tf
Mentioned in GitHub
jhare96/reinforcement-learning
tf
Mentioned in GitHub
JonasRSV/PPO
tf
Mentioned in GitHub
takuseno/ppo
tf
Mentioned in GitHub
reinforcement-learning-kr/pg_travel
pytorch
Mentioned in GitHub
bentrevett/pytorch-rl
pytorch
Mentioned in GitHub
hdparks/AsteroidsDeepReinforcement
pytorch
Mentioned in GitHub
amaudruz/RL_openaigym
pytorch
Mentioned in GitHub
Narsil/rl-baselines
pytorch
Mentioned in GitHub
tuanpnm99/RLPongAgent
pytorch
Mentioned in GitHub
amzoyang/CS-221-Final-Project
pytorch
Mentioned in GitHub
OctThe16th/PPO-Keras
Mentioned in GitHub
alexbaumi/PPO-Algorithm
pytorch
Mentioned in GitHub
benpetit/cs379c
tf
Mentioned in GitHub
liyiyuian/Deep-Learning
Mentioned in GitHub
deconlabs/Binanace_trading_simulation
pytorch
Mentioned in GitHub
inoryy/reaver
tf
Mentioned in GitHub
taku-y/20181125-pybullet
tf
Mentioned in GitHub
cipher982/ppo-exploration
pytorch
Mentioned in GitHub
xtma/simple-pytorch-rl
pytorch
Mentioned in GitHub
NervanaSystems/coach
tf
Mentioned in GitHub
facebookresearch/ReAgent
pytorch
Mentioned in GitHub
georgkruse/cleanqrl
pytorch
Mentioned in GitHub
uvipen/super-mario-bros-ppo-pytorch
pytorch
Mentioned in GitHub
goncharom/PPOv1
pytorch
Mentioned in GitHub
CSautier/PongBot
tf
Mentioned in GitHub
tatsu-lab/linguistic_calibration
pytorch
Mentioned in GitHub
marload/DeepRL-TensorFlow2
tf
Mentioned in GitHub
emerge-lab/nocturne_lab
pytorch
Mentioned in GitHub
tilkb/thermoai
tf
Mentioned in GitHub
shiningsunnyday/mcts-chess
pytorch
Mentioned in GitHub
tommyvsfu1/RL-NTU
pytorch
Mentioned in GitHub
theresearchai/vehicle_routing_rl_2
pytorch
Mentioned in GitHub
jongornet14/HyperController
pytorch
Mentioned in GitHub
bay3s/ppo-rl
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
continuous-control-on-lunar-lander-openai-gymPPO
Score: 175.14±44.94
neural-architecture-search-on-nats-benchPPO (Schulman et al., 2017)
Test Accuracy: 44.95
neural-architecture-search-on-nats-bench-1PPO (Schulman et al., 2017)
Test Accuracy: 94.02
neural-architecture-search-on-nats-bench-2PPO (Schulman et al., 2017)
Test Accuracy: 71.68
openai-gym-on-ant-v4PPO
Average Return: 608.97
openai-gym-on-halfcheetah-v4PPO
Average Return: 6006.11
openai-gym-on-hopper-v4PPO
Average Return: 790.77
openai-gym-on-humanoid-v4PPO
Average Return: 925.89
openai-gym-on-walker2d-v4PPO
Average Return: 2739.81

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Proximal Policy Optimization Algorithms | Papers | HyperAI