4 个月前

基于分布的强化学习与分位数回归

基于分布的强化学习与分位数回归

摘要

在强化学习中,智能体通过采取行动并与环境互动来观察下一个状态和奖励。当这些状态转移、奖励和行动以概率方式进行采样时,它们都可能在观察到的长期回报中引入随机性。传统上,强化学习算法通过对这种随机性进行平均来估计价值函数。本文基于近期倡导的一种分布式强化学习方法,该方法明确建模回报的分布,而不仅仅是估计其均值。也就是说,我们研究了学习价值分布而不是价值函数的方法。我们给出了一些结果,填补了Bellemare、Dabney和Munos(2017)在理论与算法结果之间的多个空白。首先,我们将现有结果扩展到了近似分布设置。其次,我们提出了一种新的与我们的理论框架一致的分布式强化学习算法。最后,我们在Atari 2600游戏上评估了这一新算法,观察到它显著优于许多最近对DQN的改进,包括相关的分布式算法C51。

代码仓库

基准测试

基准方法指标
atari-games-on-atari-2600-alienQR-DQN-1
Score: 4871
atari-games-on-atari-2600-amidarQR-DQN-1
Score: 1641
atari-games-on-atari-2600-assaultQR-DQN-1
Score: 22012
atari-games-on-atari-2600-asterixQR-DQN-1
Score: 261025
atari-games-on-atari-2600-asteroidsQR-DQN-1
Score: 4226
atari-games-on-atari-2600-atlantisQR-DQN-1
Score: 971850
atari-games-on-atari-2600-bank-heistQR-DQN-1
Score: 1249
atari-games-on-atari-2600-battle-zoneQR-DQN-1
Score: 39268
atari-games-on-atari-2600-beam-riderQR-DQN-1
Score: 34821
atari-games-on-atari-2600-berzerkQR-DQN-1
Score: 3117
atari-games-on-atari-2600-bowlingQR-DQN-1
Score: 77.2
atari-games-on-atari-2600-boxingQR-DQN-1
Score: 99.9
atari-games-on-atari-2600-breakoutQR-DQN-1
Score: 742
atari-games-on-atari-2600-centipedeQR-DQN-1
Score: 12447
atari-games-on-atari-2600-chopper-commandQR-DQN-1
Score: 14667
atari-games-on-atari-2600-crazy-climberQR-DQN-1
Score: 161196
atari-games-on-atari-2600-defenderQR-DQN-1
Score: 47887
atari-games-on-atari-2600-demon-attackQR-DQN-1
Score: 121551
atari-games-on-atari-2600-double-dunkQR-DQN-1
Score: 21.9
atari-games-on-atari-2600-enduroQR-DQN-1
Score: 2355
atari-games-on-atari-2600-fishing-derbyQR-DQN-1
Score: 39
atari-games-on-atari-2600-freewayQR-DQN-1
Score: 34
atari-games-on-atari-2600-frostbiteQR-DQN-1
Score: 4384
atari-games-on-atari-2600-gopherQR-DQN-1
Score: 113585
atari-games-on-atari-2600-gravitarQR-DQN-1
Score: 995
atari-games-on-atari-2600-heroQR-DQN-1
Score: 21395
atari-games-on-atari-2600-ice-hockeyQR-DQN-1
Score: -1.7
atari-games-on-atari-2600-james-bondQR-DQN-1
Score: 4703
atari-games-on-atari-2600-kangarooQR-DQN-1
Score: 15356
atari-games-on-atari-2600-krullQR-DQN-1
Score: 11447
atari-games-on-atari-2600-kung-fu-masterQR-DQN-1
Score: 76642
atari-games-on-atari-2600-montezumas-revengeQR-DQN-1
Score: 0
atari-games-on-atari-2600-ms-pacmanQR-DQN-1
Score: 5821
atari-games-on-atari-2600-name-this-gameQR-DQN-1
Score: 21890
atari-games-on-atari-2600-phoenixQR-DQN-1
Score: 16585
atari-games-on-atari-2600-pitfallQR-DQN-1
Score: 0
atari-games-on-atari-2600-pongQR-DQN-1
Score: 21
atari-games-on-atari-2600-private-eyeQR-DQN-1
Score: 350
atari-games-on-atari-2600-qbertQR-DQN-1
Score: 572510
atari-games-on-atari-2600-river-raidQR-DQN-1
Score: 17571
atari-games-on-atari-2600-road-runnerQR-DQN-1
Score: 64262
atari-games-on-atari-2600-robotankQR-DQN-1
Score: 59.4
atari-games-on-atari-2600-seaquestQR-DQN-1
Score: 8268
atari-games-on-atari-2600-skiingQR-DQN-1
Score: -9324
atari-games-on-atari-2600-solarisQR-DQN-1
Score: 6740
atari-games-on-atari-2600-space-invadersQR-DQN-1
Score: 20972
atari-games-on-atari-2600-star-gunnerQR-DQN-1
Score: 77495
atari-games-on-atari-2600-surroundQR-DQN-1
Score: 8.2
atari-games-on-atari-2600-tennisQR-DQN-1
Score: 23.6
atari-games-on-atari-2600-time-pilotQR-DQN-1
Score: 10345
atari-games-on-atari-2600-tutankhamQR-DQN-1
Score: 297
atari-games-on-atari-2600-up-and-downQR-DQN-1
Score: 71260
atari-games-on-atari-2600-ventureQR-DQN-1
Score: 43.9
atari-games-on-atari-2600-video-pinballQR-DQN-1
Score: 705662
atari-games-on-atari-2600-wizard-of-worQR-DQN-1
Score: 25061
atari-games-on-atari-2600-yars-revengeQR-DQN-1
Score: 26447
atari-games-on-atari-2600-zaxxonQR-DQN-1
Score: 13112

用 AI 构建 AI

从想法到上线——通过免费 AI 协同编程、开箱即用的环境和市场最优价格的 GPU 加速您的 AI 开发

AI 协同编程
即用型 GPU
最优价格
立即开始

Hyper Newsletters

订阅我们的最新资讯
我们会在北京时间 每周一的上午九点 向您的邮箱投递本周内的最新更新
邮件发送服务由 MailChimp 提供
基于分布的强化学习与分位数回归 | 论文 | HyperAI超神经