4 个月前

IMPALA:基于重要性加权行为者-学习者架构的可扩展分布式深度强化学习

IMPALA:基于重要性加权行为者-学习者架构的可扩展分布式深度强化学习

摘要

在本研究中,我们的目标是使用单一的强化学习代理及其单一参数集来解决大量任务。一个关键挑战是如何处理增加的数据量和延长的训练时间。我们开发了一种新的分布式代理IMPALA(重要性加权行为者-学习者架构),该代理不仅在单机训练中更高效地利用资源,而且可以扩展到数千台机器,同时不会牺牲数据效率或资源利用率。通过结合解耦的行为与学习以及一种称为V-trace的新颖离策略校正方法,我们实现了高吞吐量下的稳定学习。我们在DMLab-30(DeepMind Lab环境中的30个任务集合(Beattie等人,2016))和Atari-57(Arcade Learning Environment中的所有可用Atari游戏(Bellemare等人,2013a))上展示了IMPALA在多任务强化学习中的有效性。实验结果表明,IMPALA能够在使用较少数据的情况下实现比以往代理更好的性能,并且由于其多任务方法的关键作用,表现出任务之间的正向迁移。

基准测试

基准方法指标
atari-games-on-atari-2600-alienIMPALA (deep)
Score: 15962.10
atari-games-on-atari-2600-amidarIMPALA (deep)
Score: 1554.79
atari-games-on-atari-2600-assaultIMPALA (deep)
Score: 19148.47
atari-games-on-atari-2600-asterixIMPALA (deep)
Score: 300732.00
atari-games-on-atari-2600-asteroidsIMPALA (deep)
Score: 108590.05
atari-games-on-atari-2600-atlantisIMPALA (deep)
Score: 849967.50
atari-games-on-atari-2600-bank-heistIMPALA (deep)
Score: 1223.15
atari-games-on-atari-2600-battle-zoneIMPALA (deep)
Score: 20885.00
atari-games-on-atari-2600-beam-riderIMPALA (deep)
Score: 32463.47
atari-games-on-atari-2600-berzerkIMPALA (deep)
Score: 1852.70
atari-games-on-atari-2600-bowlingIMPALA (deep)
Score: 59.92
atari-games-on-atari-2600-boxingIMPALA (deep)
Score: 99.96
atari-games-on-atari-2600-breakoutIMPALA (deep)
Score: 787.34
atari-games-on-atari-2600-centipedeIMPALA (deep)
Score: 11049.75
atari-games-on-atari-2600-chopper-commandIMPALA (deep)
Score: 28255.00
atari-games-on-atari-2600-crazy-climberIMPALA (deep)
Score: 136950.00
atari-games-on-atari-2600-defenderIMPALA (deep)
Score: 185203.00
atari-games-on-atari-2600-demon-attackIMPALA (deep)
Score: 132826.98
atari-games-on-atari-2600-double-dunkIMPALA (deep)
Score: -0.33
atari-games-on-atari-2600-enduroIMPALA (deep)
Score: 0.00
atari-games-on-atari-2600-fishing-derbyIMPALA (deep)
Score: 44.85
atari-games-on-atari-2600-freewayIMPALA (deep)
Score: 0.00
atari-games-on-atari-2600-frostbiteIMPALA (deep)
Score: 317.75
atari-games-on-atari-2600-gopherIMPALA (deep)
Score: 66782.30
atari-games-on-atari-2600-gravitarIMPALA (deep)
Score: 359.50
atari-games-on-atari-2600-heroIMPALA (deep)
Score: 33730.55
atari-games-on-atari-2600-ice-hockeyIMPALA (deep)
Score: 3.48
atari-games-on-atari-2600-james-bondIMPALA (deep)
Score: 601.50
atari-games-on-atari-2600-kangarooIMPALA (deep)
Score: 1632.00
atari-games-on-atari-2600-krullIMPALA (deep)
Score: 8147.40
atari-games-on-atari-2600-kung-fu-masterIMPALA (deep)
Score: 43375.50
atari-games-on-atari-2600-montezumas-revengeIMPALA (deep)
Score: 0.00
atari-games-on-atari-2600-ms-pacmanIMPALA (deep)
Score: 7342.32
atari-games-on-atari-2600-name-this-gameIMPALA (deep)
Score: 21537.20
atari-games-on-atari-2600-phoenixIMPALA (deep)
Score: 210996.45
atari-games-on-atari-2600-pitfallIMPALA (deep)
Score: -1.66
atari-games-on-atari-2600-pongIMPALA (deep)
Score: 20.98
atari-games-on-atari-2600-private-eyeIMPALA (deep)
Score: 98.50
atari-games-on-atari-2600-qbertIMPALA (deep)
Score: 351200.12
atari-games-on-atari-2600-river-raidIMPALA (deep)
Score: 29608.05
atari-games-on-atari-2600-road-runnerIMPALA (deep)
Score: 57121.00
atari-games-on-atari-2600-robotankIMPALA (deep)
Score: 12.96
atari-games-on-atari-2600-seaquestIMPALA (deep)
Score: 1753.20
atari-games-on-atari-2600-skiingIMPALA (deep)
Score: -10180.38
atari-games-on-atari-2600-solarisIMPALA (deep)
Score: 2365.00
atari-games-on-atari-2600-space-invadersIMPALA (deep)
Score: 43595.78
atari-games-on-atari-2600-star-gunnerIMPALA (deep)
Score: 200625.00
atari-games-on-atari-2600-surroundIMPALA (deep)
Score: 7.56
atari-games-on-atari-2600-tennisIMPALA (deep)
Score: 0.55
atari-games-on-atari-2600-time-pilotIMPALA (deep)
Score: 48481.50
atari-games-on-atari-2600-tutankhamIMPALA (deep)
Score: 292.11
atari-games-on-atari-2600-up-and-downIMPALA (deep)
Score: 332546.75
atari-games-on-atari-2600-ventureIMPALA (deep)
Score: 0.00
atari-games-on-atari-2600-video-pinballIMPALA (deep)
Score: 572898.27
atari-games-on-atari-2600-wizard-of-worIMPALA (deep)
Score: 9157.50
atari-games-on-atari-2600-yars-revengeIMPALA (deep)
Score: 84231.14
atari-games-on-atari-2600-zaxxonIMPALA (deep)
Score: 32935.50
atari-games-on-atari-57IMPALA, deep
Human World Record Breakthrough: 3
Mean Human Normalized Score: 957.34%
atari-games-on-atari-gamesIMPALA, deep
Mean Human Normalized Score: 957.34%

用 AI 构建 AI

从想法到上线——通过免费 AI 协同编程、开箱即用的环境和市场最优价格的 GPU 加速您的 AI 开发

AI 协同编程
即用型 GPU
最优价格
立即开始

Hyper Newsletters

订阅我们的最新资讯
我们会在北京时间 每周一的上午九点 向您的邮箱投递本周内的最新更新
邮件发送服务由 MailChimp 提供
IMPALA:基于重要性加权行为者-学习者架构的可扩展分布式深度强化学习 | 论文 | HyperAI超神经