4 个月前

优先经验回放

优先经验回放

摘要

经验回放(Experience Replay)使在线强化学习代理能够记住并重用过去的经历。在先前的研究中,经验转换是从回放记忆中均匀采样的。然而,这种方法仅仅以它们最初发生的频率重播这些转换,而不考虑其重要性。本文提出了一种优先经验回放框架,以便更频繁地重播重要的转换,从而更高效地学习。我们将在深度Q网络(Deep Q-Networks, DQN)中应用优先经验回放,这是一种在许多Atari游戏中达到人类水平表现的强化学习算法。带有优先经验回放的DQN实现了新的最先进水平,在49款游戏中有41款的表现优于使用均匀回放的DQN。

代码仓库

snhwang/p3_collab-compet
pytorch
GitHub 中提及
MathPhysSim/PER-NAF
tf
GitHub 中提及
nbopardi/smb
tf
GitHub 中提及
CharlotteMorrison/Baxter-Research
pytorch
GitHub 中提及
VictorZuanazzi/Project_RL
pytorch
GitHub 中提及
snhwang/p1_navigation_SNH
pytorch
GitHub 中提及
VasaKiDD/TD3-deep-rl-research
pytorch
GitHub 中提及
utarumo/RL_implementation
tf
GitHub 中提及
snhwang/p2-continuous-control-SNH
pytorch
GitHub 中提及
kayuksel/pytorch-ars
pytorch
GitHub 中提及
guillaumeboniface/bananaland
pytorch
GitHub 中提及
Clement-Hui/Q-Learning
pytorch
GitHub 中提及
chainer/chainerrl
pytorch
GitHub 中提及
justinmaojones/starr
GitHub 中提及
dtak/hip-mdp-public
tf
GitHub 中提及
xinjinghao/sparrow-v1
pytorch
GitHub 中提及
olonok69/Udacity_Banana_Unity
pytorch
GitHub 中提及
1jsingh/rl_navigation
pytorch
GitHub 中提及
ACampero/dopamine
tf
GitHub 中提及
tensorlayer/RLzoo
tf
GitHub 中提及
ku2482/soft-actor-critic.pytorch
pytorch
GitHub 中提及
eddynelson/dqn
tf
GitHub 中提及
rybread1/deep-rl-trex
tf
GitHub 中提及
CharlotteMorrison/Baxter-VREP
pytorch
GitHub 中提及
tphanson/xupr-drl
tf
GitHub 中提及
Adrelf/DRL-navigation
pytorch
GitHub 中提及
yusme/DDPG
tf
GitHub 中提及
KatyNTsachi/Hierarchical-RL
tf
GitHub 中提及
mindspore-courses/Rainbow-MindSpore
mindspore
GitHub 中提及
kmdanielduan/DQN_Family_PyTorch
pytorch
GitHub 中提及
xusophia/DataSciFinalProj
pytorch
GitHub 中提及
rybread1/DeepRlTrex
tf
GitHub 中提及
sunfex/weighted-sac
pytorch
GitHub 中提及
Arrabonae/openai_DDDQN
pytorch
GitHub 中提及
instadeepai/flashbax
jax
GitHub 中提及
yzheng51/rl-dino-run
pytorch
GitHub 中提及
SimonRamstedt/ddpg
tf
GitHub 中提及
HussonnoisMaxence/RL_Algorithms
pytorch
GitHub 中提及
NervanaSystems/coach
tf
GitHub 中提及
Suryavf/SelfDrivingCar
pytorch
GitHub 中提及
toshikwa/soft-actor-critic.pytorch
pytorch
GitHub 中提及
shashwatsaxena571/DRL-navigation
pytorch
GitHub 中提及
Guillaume-Cr/lunar_lander_per
pytorch
GitHub 中提及
ku2482/sac-discrete.pytorch
pytorch
GitHub 中提及
Brandon-Rozek/DeepRL
GitHub 中提及
V0LsTeR/DQN_heap
tf
GitHub 中提及
SayhoKim/tetrisRL
tf
GitHub 中提及

基准测试

基准方法指标
atari-games-on-atari-2600-alienPrior noop
Score: 4203.8
atari-games-on-atari-2600-alienPrior hs
Score: 1334.7
atari-games-on-atari-2600-amidarPrior hs
Score: 129.1
atari-games-on-atari-2600-amidarPrior noop
Score: 1838.9
atari-games-on-atari-2600-assaultPrior noop
Score: 7672.1
atari-games-on-atari-2600-assaultPrior hs
Score: 6548.9
atari-games-on-atari-2600-asterixPrior hs
Score: 22484.5
atari-games-on-atari-2600-asterixPrior noop
Score: 31527
atari-games-on-atari-2600-asteroidsPrior noop
Score: 2654.3
atari-games-on-atari-2600-asteroidsPrior hs
Score: 1745.1
atari-games-on-atari-2600-atlantisPrior noop
Score: 357324.0
atari-games-on-atari-2600-atlantisPrior hs
Score: 330647.0
atari-games-on-atari-2600-bank-heistPrior hs
Score: 876.6
atari-games-on-atari-2600-bank-heistPrior noop
Score: 1054.6
atari-games-on-atari-2600-battle-zonePrior hs
Score: 25520.0
atari-games-on-atari-2600-battle-zonePrior noop
Score: 31530.0
atari-games-on-atari-2600-beam-riderPrior noop
Score: 23384.2
atari-games-on-atari-2600-beam-riderPrior hs
Score: 31181.3
atari-games-on-atari-2600-berzerkPrior noop
Score: 1305.6
atari-games-on-atari-2600-berzerkPrior hs
Score: 865.9
atari-games-on-atari-2600-bowlingPrior hs
Score: 52
atari-games-on-atari-2600-bowlingPrior noop
Score: 47.9
atari-games-on-atari-2600-boxingPrior noop
Score: 95.6
atari-games-on-atari-2600-boxingPrior hs
Score: 72.3
atari-games-on-atari-2600-breakoutPrior noop
Score: 373.9
atari-games-on-atari-2600-breakoutPrior hs
Score: 343.0
atari-games-on-atari-2600-centipedePrior noop
Score: 4463.2
atari-games-on-atari-2600-centipedePrior hs
Score: 3489.1
atari-games-on-atari-2600-chopper-commandPrior hs
Score: 4635.0
atari-games-on-atari-2600-chopper-commandPrior noop
Score: 8600.0
atari-games-on-atari-2600-crazy-climberPrior noop
Score: 141161.0
atari-games-on-atari-2600-crazy-climberPrior hs
Score: 127512.0
atari-games-on-atari-2600-demon-attackPrior noop
Score: 71846.4
atari-games-on-atari-2600-demon-attackPrior hs
Score: 61277.5
atari-games-on-atari-2600-double-dunkPrior noop
Score: 18.5
atari-games-on-atari-2600-double-dunkPrior hs
Score: 16.0
atari-games-on-atari-2600-enduroPrior noop
Score: 2093.0
atari-games-on-atari-2600-enduroPrior hs
Score: 1831.0
atari-games-on-atari-2600-fishing-derbyPrior hs
Score: 9.8
atari-games-on-atari-2600-fishing-derbyPrior noop
Score: 39.5
atari-games-on-atari-2600-freewayPrior hs
Score: 28.9
atari-games-on-atari-2600-freewayPrior noop
Score: 33.7
atari-games-on-atari-2600-frostbitePrior hs
Score: 3510.0
atari-games-on-atari-2600-frostbitePrior noop
Score: 4380.1
atari-games-on-atari-2600-gopherPrior hs
Score: 34858.8
atari-games-on-atari-2600-gopherPrior noop
Score: 32487.2
atari-games-on-atari-2600-gravitarPrior noop
Score: 548.5
atari-games-on-atari-2600-gravitarPrior hs
Score: 269.5
atari-games-on-atari-2600-heroPrior hs
Score: 20889.9
atari-games-on-atari-2600-heroPrior noop
Score: 23037.7
atari-games-on-atari-2600-ice-hockeyPrior hs
Score: -0.2
atari-games-on-atari-2600-ice-hockeyPrior noop
Score: 1.3
atari-games-on-atari-2600-james-bondPrior hs
Score: 3961.0
atari-games-on-atari-2600-james-bondPrior noop
Score: 5148.0
atari-games-on-atari-2600-kangarooPrior noop
Score: 16200.0
atari-games-on-atari-2600-kangarooPrior hs
Score: 12185.0
atari-games-on-atari-2600-krullPrior hs
Score: 6872.8
atari-games-on-atari-2600-krullPrior noop
Score: 9728.0
atari-games-on-atari-2600-kung-fu-masterPrior noop
Score: 39581.0
atari-games-on-atari-2600-kung-fu-masterPrior hs
Score: 31676.0
atari-games-on-atari-2600-montezumas-revengePrior hs
Score: 51
atari-games-on-atari-2600-ms-pacmanPrior noop
Score: 6518.7
atari-games-on-atari-2600-ms-pacmanPrior hs
Score: 1865.9
atari-games-on-atari-2600-name-this-gamePrior hs
Score: 10497.6
atari-games-on-atari-2600-name-this-gamePrior noop
Score: 12270.5
atari-games-on-atari-2600-pongPrior hs
Score: 18.9
atari-games-on-atari-2600-pongPrior noop
Score: 20.6
atari-games-on-atari-2600-private-eyePrior hs
Score: 670.7
atari-games-on-atari-2600-private-eyePrior noop
Score: 200.0
atari-games-on-atari-2600-qbertPrior noop
Score: 16256.5
atari-games-on-atari-2600-qbertPrior hs
Score: 9944
atari-games-on-atari-2600-river-raidPrior hs
Score: 11807.2
atari-games-on-atari-2600-river-raidPrior noop
Score: 14522.3
atari-games-on-atari-2600-road-runnerPrior hs
Score: 52264.0
atari-games-on-atari-2600-road-runnerPrior noop
Score: 57608.0
atari-games-on-atari-2600-robotankPrior noop
Score: 62.6
atari-games-on-atari-2600-robotankPrior hs
Score: 56.2
atari-games-on-atari-2600-seaquestPrior noop
Score: 26357.8
atari-games-on-atari-2600-seaquestPrior hs
Score: 25463.7
atari-games-on-atari-2600-space-invadersPrior noop
Score: 2865.8
atari-games-on-atari-2600-space-invadersPrior hs
Score: 3912.1
atari-games-on-atari-2600-star-gunnerPrior noop
Score: 63302.0
atari-games-on-atari-2600-star-gunnerPrior hs
Score: 61582.0
atari-games-on-atari-2600-tennisPrior hs
Score: -5.3
atari-games-on-atari-2600-tennisPrior noop
Score: 0.0
atari-games-on-atari-2600-time-pilotPrior hs
Score: 5963.0
atari-games-on-atari-2600-time-pilotPrior noop
Score: 9197.0
atari-games-on-atari-2600-tutankhamPrior noop
Score: 204.6
atari-games-on-atari-2600-tutankhamPrior hs
Score: 56.9
atari-games-on-atari-2600-up-and-downPrior noop
Score: 16154.1
atari-games-on-atari-2600-up-and-downPrior hs
Score: 12157.4
atari-games-on-atari-2600-venturePrior hs
Score: 94.0
atari-games-on-atari-2600-venturePrior noop
Score: 54.0
atari-games-on-atari-2600-video-pinballPrior hs
Score: 295972.8
atari-games-on-atari-2600-video-pinballPrior noop
Score: 282007.3
atari-games-on-atari-2600-wizard-of-worPrior noop
Score: 4802.0
atari-games-on-atari-2600-wizard-of-worPrior hs
Score: 5727.0
atari-games-on-atari-2600-zaxxonPrior hs
Score: 9474.0
atari-games-on-atari-2600-zaxxonPrior noop
Score: 10469.0

用 AI 构建 AI

从想法到上线——通过免费 AI 协同编程、开箱即用的环境和市场最优价格的 GPU 加速您的 AI 开发

AI 协同编程
即用型 GPU
最优价格
立即开始

Hyper Newsletters

订阅我们的最新资讯
我们会在北京时间 每周一的上午九点 向您的邮箱投递本周内的最新更新
邮件发送服务由 MailChimp 提供
优先经验回放 | 论文 | HyperAI超神经