4 个月前

大规模并行深度强化学习方法

大规模并行深度强化学习方法

摘要

我们提出了首个大规模分布式深度强化学习架构。该架构主要由四个组件构成:生成新行为的并行执行器;从存储的经验中进行训练的并行学习器;用于表示价值函数或行为策略的分布式神经网络;以及分布式经验存储库。我们利用该架构实现了深度Q网络算法(Deep Q-Network, DQN)。我们的分布式算法被应用于来自Arcade Learning Environment的Atari 2600游戏中的49款游戏,且使用了相同的超参数设置。在49款游戏中,我们的性能超过了非分布式的DQN,在其中41款游戏中表现尤为突出,并且在大多数游戏中,实现这些结果所需的时间也减少了近一个数量级。

代码仓库

londoed/Kortex
tf
GitHub 中提及
nandomp/AICollaboratory
GitHub 中提及
londoed/Gorila
tf
GitHub 中提及

基准测试

基准方法指标
atari-games-on-atari-2600-alienGorila
Score: 813.5
atari-games-on-atari-2600-amidarGorila
Score: 189.2
atari-games-on-atari-2600-assaultGorila
Score: 1195.8
atari-games-on-atari-2600-asterixGorila
Score: 3324.7
atari-games-on-atari-2600-asteroidsGorila
Score: 933.6
atari-games-on-atari-2600-atlantisGorila
Score: 629166.5
atari-games-on-atari-2600-bank-heistGorila
Score: 399.4
atari-games-on-atari-2600-battle-zoneGorila
Score: 19938.0
atari-games-on-atari-2600-beam-riderGorila
Score: 3822.1
atari-games-on-atari-2600-bowlingGorila
Score: 54
atari-games-on-atari-2600-boxingGorila
Score: 74.2
atari-games-on-atari-2600-breakoutGorila
Score: 313.0
atari-games-on-atari-2600-centipedeGorila
Score: 6296.9
atari-games-on-atari-2600-chopper-commandGorila
Score: 3191.8
atari-games-on-atari-2600-crazy-climberGorila
Score: 65451.0
atari-games-on-atari-2600-demon-attackGorila
Score: 14880.1
atari-games-on-atari-2600-double-dunkGorila
Score: -11.3
atari-games-on-atari-2600-enduroGorila
Score: 71.0
atari-games-on-atari-2600-fishing-derbyGorila
Score: 4.6
atari-games-on-atari-2600-freewayGorila
Score: 10.2
atari-games-on-atari-2600-frostbiteGorila
Score: 426.6
atari-games-on-atari-2600-gopherGorila
Score: 4373.0
atari-games-on-atari-2600-gravitarGorila
Score: 538.4
atari-games-on-atari-2600-heroGorila
Score: 8963.4
atari-games-on-atari-2600-ice-hockeyGorila
Score: -1.7
atari-games-on-atari-2600-james-bondGorila
Score: 444.0
atari-games-on-atari-2600-kangarooGorila
Score: 1431.0
atari-games-on-atari-2600-krullGorila
Score: 6363.1
atari-games-on-atari-2600-kung-fu-masterGorila
Score: 20620.0
atari-games-on-atari-2600-montezumas-revengeGorila
Score: 84
atari-games-on-atari-2600-ms-pacmanGorila
Score: 1263.0
atari-games-on-atari-2600-name-this-gameGorila
Score: 9238.5
atari-games-on-atari-2600-pongGorila
Score: 16.7
atari-games-on-atari-2600-private-eyeGorila
Score: 2598.6
atari-games-on-atari-2600-qbertGorila
Score: 7089.8
atari-games-on-atari-2600-river-raidGorila
Score: 5310.3
atari-games-on-atari-2600-road-runnerGorila
Score: 43079.8
atari-games-on-atari-2600-robotankGorila
Score: 61.8
atari-games-on-atari-2600-seaquestGorila
Score: 10145.9
atari-games-on-atari-2600-space-invadersGorila
Score: 1183.3
atari-games-on-atari-2600-star-gunnerGorila
Score: 14919.2
atari-games-on-atari-2600-tennisGorila
Score: -0.7
atari-games-on-atari-2600-time-pilotGorila
Score: 8267.8
atari-games-on-atari-2600-tutankhamGorila
Score: 118.5
atari-games-on-atari-2600-up-and-downGorila
Score: 8747.7
atari-games-on-atari-2600-ventureGorila
Score: 523.4
atari-games-on-atari-2600-video-pinballGorila
Score: 112093.4
atari-games-on-atari-2600-wizard-of-worGorila
Score: 10431.0
atari-games-on-atari-2600-zaxxonGorila
Score: 6159.4

用 AI 构建 AI

从想法到上线——通过免费 AI 协同编程、开箱即用的环境和市场最优价格的 GPU 加速您的 AI 开发

AI 协同编程
即用型 GPU
最优价格
立即开始

Hyper Newsletters

订阅我们的最新资讯
我们会在北京时间 每周一的上午九点 向您的邮箱投递本周内的最新更新
邮件发送服务由 MailChimp 提供
大规模并行深度强化学习方法 | 论文 | HyperAI超神经