3 个月前

分布式强化学习中的循环经验回放

分布式强化学习中的循环经验回放

摘要

在近期分布式强化学习(RL)智能体训练取得成功的基础上,本文研究了基于循环神经网络(RNN)的强化学习智能体从分布式优先经验回放(prioritized experience replay)中进行训练的方法。我们分析了参数延迟所导致的表征漂移(representational drift)与循环状态滞后的现象,并通过实验提出了改进的训练策略。在采用单一网络架构和固定超参数设置的前提下,所提出的智能体——循环优先经验回放分布式DQN(Recurrent Replay Distributed DQN)——在Atari-57基准上将此前的最先进水平提升了四倍,并在DMLab-30任务上超越了现有最优性能。该智能体是首个在57个Atari游戏中的52个游戏中达到并超过人类水平表现的智能体。

基准测试

基准方法指标
atari-games-on-atari-2600-alienR2D2
Score: 229496.9
atari-games-on-atari-2600-amidarR2D2
Score: 29321.4
atari-games-on-atari-2600-assaultR2D2
Score: 108197.0
atari-games-on-atari-2600-asterixR2D2
Score: 999153.3
atari-games-on-atari-2600-asteroidsR2D2
Score: 357867.7
atari-games-on-atari-2600-atlantisR2D2
Score: 1620764.0
atari-games-on-atari-2600-bank-heistR2D2
Score: 24235.9
atari-games-on-atari-2600-battle-zoneR2D2
Score: 751880.0
atari-games-on-atari-2600-beam-riderR2D2
Score: 188257.4
atari-games-on-atari-2600-berzerkR2D2
Score: 53318.7
atari-games-on-atari-2600-bowlingR2D2
Score: 219.5
atari-games-on-atari-2600-boxingR2D2
Score: 98.5
atari-games-on-atari-2600-breakoutR2D2
Score: 837.7
atari-games-on-atari-2600-centipedeR2D2
Score: 599140.3
atari-games-on-atari-2600-chopper-commandR2D2
Score: 986652.0
atari-games-on-atari-2600-crazy-climberR2D2
Score: 366690.7
atari-games-on-atari-2600-defenderR2D2
Score: 665792.0
atari-games-on-atari-2600-demon-attackR2D2
Score: 140002.3
atari-games-on-atari-2600-double-dunkR2D2
Score: 23.7
atari-games-on-atari-2600-enduroR2D2
Score: 2372.7
atari-games-on-atari-2600-fishing-derbyR2D2
Score: 85.8
atari-games-on-atari-2600-freewayR2D2
Score: 32.5
atari-games-on-atari-2600-frostbiteR2D2
Score: 315456.4
atari-games-on-atari-2600-gopherR2D2
Score: 124776.3
atari-games-on-atari-2600-gravitarR2D2
Score: 15680.7
atari-games-on-atari-2600-heroR2D2
Score: 39537.1
atari-games-on-atari-2600-ice-hockeyR2D2
Score: 79.3
atari-games-on-atari-2600-james-bondR2D2
Score: 25354.0
atari-games-on-atari-2600-kangarooR2D2
Score: 14130.7
atari-games-on-atari-2600-krullR2D2
Score: 218448.1
atari-games-on-atari-2600-kung-fu-masterR2D2
Score: 233413.3
atari-games-on-atari-2600-montezumas-revengeR2D2
Score: 2061.3
atari-games-on-atari-2600-ms-pacmanR2D2
Score: 42281.7
atari-games-on-atari-2600-name-this-gameR2D2
Score: 58182.7
atari-games-on-atari-2600-phoenixR2D2
Score: 864020.0
atari-games-on-atari-2600-pitfallR2D2
Score: 0.0
atari-games-on-atari-2600-pongR2D2
Score: 21.0
atari-games-on-atari-2600-private-eyeR2D2
Score: 5322.7
atari-games-on-atari-2600-qbertR2D2
Score: 408850.0
atari-games-on-atari-2600-river-raidR2D2
Score: 45632.1
atari-games-on-atari-2600-road-runnerR2D2
Score: 599246.7
atari-games-on-atari-2600-robotankR2D2
Score: 100.4
atari-games-on-atari-2600-seaquestR2D2
Score: 999996.7
atari-games-on-atari-2600-skiingR2D2
Score: -30021.7
atari-games-on-atari-2600-solarisR2D2
Score: 3787.2
atari-games-on-atari-2600-space-invadersR2D2
Score: 43223.4
atari-games-on-atari-2600-star-gunnerR2D2
Score: 717344.0
atari-games-on-atari-2600-surroundR2D2
Score: 9.9
atari-games-on-atari-2600-tennisR2D2
Score: -0.1
atari-games-on-atari-2600-time-pilotR2D2
Score: 445377.3
atari-games-on-atari-2600-tutankhamR2D2
Score: 395.3
atari-games-on-atari-2600-up-and-downR2D2
Score: 589226.9
atari-games-on-atari-2600-ventureR2D2
Score: 1970.7
atari-games-on-atari-2600-video-pinballR2D2
Score: 999383.2
atari-games-on-atari-2600-wizard-of-worR2D2
Score: 144362.7
atari-games-on-atari-2600-yars-revengeR2D2
Score: 995048.4
atari-games-on-atari-2600-zaxxonR2D2
Score: 224910.7
atari-games-on-atari-57R2D2
Human World Record Breakthrough: 15
Mean Human Normalized Score: 3374.31%
atari-games-on-atari-gameR2D2
Human World Record Breakthrough: 15
atari-games-on-atari-gamesR2D2
Mean Human Normalized Score: 3374.31%

用 AI 构建 AI

从想法到上线——通过免费 AI 协同编程、开箱即用的环境和市场最优价格的 GPU 加速您的 AI 开发

AI 协同编程
即用型 GPU
最优价格
立即开始

Hyper Newsletters

订阅我们的最新资讯
我们会在北京时间 每周一的上午九点 向您的邮箱投递本周内的最新更新
邮件发送服务由 MailChimp 提供
分布式强化学习中的循环经验回放 | 论文 | HyperAI超神经