3 个月前

DNA:基于双网络架构的近端策略优化

DNA:基于双网络架构的近端策略优化

摘要

本文研究了深度演员-评论家强化学习模型中同时学习价值函数与策略所面临的问题。我们发现,将这两项任务联合学习的常见做法存在次优性,其根源在于两类任务之间存在数量级差异的噪声水平。相反,我们提出通过独立学习这两个任务,并引入受约束的蒸馏阶段,能够显著提升模型性能。此外,我们发现:通过采用更低的方差回报估计,可有效降低策略梯度的噪声水平;而通过采用更低的偏差回报估计,则有助于降低价值函数学习过程中的噪声水平。基于上述洞见,我们提出了对近端策略优化(Proximal Policy Optimization, PPO)的改进方法,称为双网络架构(Dual Network Architecture, DNA),该方法在性能上显著优于其原始版本。在所测试的五个环境中的四个上,DNA 的表现甚至超过了广受认可的 Rainbow DQN 算法,且在更具挑战性的随机控制设置下依然保持优异性能。

代码仓库

maitchison/PPO
官方
pytorch

基准测试

基准方法指标
atari-games-on-atari-2600-alienDNA
Score: 5021
atari-games-on-atari-2600-amidarDNA
Score: 1025
atari-games-on-atari-2600-assaultDNA
Score: 16293
atari-games-on-atari-2600-asterixDNA
Score: 323965
atari-games-on-atari-2600-asteroidsDNA
Score: 165973
atari-games-on-atari-2600-atlantisDNA
Score: 932559
atari-games-on-atari-2600-bank-heistDNA
Score: 1286
atari-games-on-atari-2600-battle-zoneDNA
Score: 71003
atari-games-on-atari-2600-beam-riderDNA
Score: 20393
atari-games-on-atari-2600-berzerkDNA
Score: 19789
atari-games-on-atari-2600-bowlingDNA
Score: 181
atari-games-on-atari-2600-boxingDNA
Score: 99.9
atari-games-on-atari-2600-breakoutDNA
Score: 626
atari-games-on-atari-2600-centipedeDNA
Score: 100194
atari-games-on-atari-2600-chopper-commandDNA
Score: 31181
atari-games-on-atari-2600-crazy-climberDNA
Score: 131623
atari-games-on-atari-2600-defenderDNA
Score: 152768
atari-games-on-atari-2600-demon-attackDNA
Score: 97909
atari-games-on-atari-2600-double-dunkDNA
Score: -1.3
atari-games-on-atari-2600-enduroDNA
Score: 2059
atari-games-on-atari-2600-fishing-derbyDNA
Score: 57.4
atari-games-on-atari-2600-freewayDNA
Score: 33
atari-games-on-atari-2600-frostbiteDNA
Score: 320
atari-games-on-atari-2600-gopherDNA
Score: 80104
atari-games-on-atari-2600-gravitarDNA
Score: 2190
atari-games-on-atari-2600-heroDNA
Score: 24904
atari-games-on-atari-2600-ice-hockeyDNA
Score: 7.2
atari-games-on-atari-2600-james-bondDNA
Score: 14102
atari-games-on-atari-2600-kangarooDNA
Score: 14373
atari-games-on-atari-2600-krullDNA
Score: 10956
atari-games-on-atari-2600-kung-fu-masterDNA
Score: 110962
atari-games-on-atari-2600-montezumas-revengeDNA
Score: 0
atari-games-on-atari-2600-ms-pacmanDNA
Score: 5894
atari-games-on-atari-2600-name-this-gameDNA
Score: 20226
atari-games-on-atari-2600-phoenixDNA
Score: 391085
atari-games-on-atari-2600-pitfallDNA
Score: 0
atari-games-on-atari-2600-pongDNA
Score: 19.7
atari-games-on-atari-2600-private-eyeDNA
Score: 100
atari-games-on-atari-2600-qbertDNA
Score: 52398
atari-games-on-atari-2600-river-raidDNA
Score: 16789
atari-games-on-atari-2600-road-runnerDNA
Score: 61713
atari-games-on-atari-2600-robotankDNA
Score: 64.8
atari-games-on-atari-2600-seaquestDNA
Score: 4146
atari-games-on-atari-2600-skiingDNA
Score: -29974
atari-games-on-atari-2600-solarisDNA
Score: 2225
atari-games-on-atari-2600-space-invadersDNA
Score: 2731
atari-games-on-atari-2600-star-gunnerDNA
Score: 104125
atari-games-on-atari-2600-surroundDNA
Score: 5.3
atari-games-on-atari-2600-tennisDNA
Score: -10.9
atari-games-on-atari-2600-time-pilotDNA
Score: 12774
atari-games-on-atari-2600-tutankhamDNA
Score: 127
atari-games-on-atari-2600-up-and-downDNA
Score: 291934
atari-games-on-atari-2600-ventureDNA
Score: 0
atari-games-on-atari-2600-video-pinballDNA
Score: 505392
atari-games-on-atari-2600-wizard-of-worDNA
Score: 20851
atari-games-on-atari-2600-yars-revengeDNA
Score: 564513
atari-games-on-atari-2600-zaxxonDNA
Score: 22588

用 AI 构建 AI

从想法到上线——通过免费 AI 协同编程、开箱即用的环境和市场最优价格的 GPU 加速您的 AI 开发

AI 协同编程
即用型 GPU
最优价格
立即开始

Hyper Newsletters

订阅我们的最新资讯
我们会在北京时间 每周一的上午九点 向您的邮箱投递本周内的最新更新
邮件发送服务由 MailChimp 提供
DNA:基于双网络架构的近端策略优化 | 论文 | HyperAI超神经