4 个月前

增大行动差距:强化学习的新算子

增大行动差距:强化学习的新算子

摘要

本文介绍了新的保持最优性的Q函数算子。首先,我们描述了一种适用于表格表示的算子——一致贝尔曼算子(consistent Bellman operator),该算子引入了局部策略一致性的概念。我们证明,这种局部一致性会导致每个状态下的动作差距增加;我们认为,增加这一差距可以减轻近似误差和估计误差对诱导贪婪策略的不利影响。此外,该算子也可应用于离散化的连续空间和时间问题,并且我们在这一背景下提供了实证结果,表明其性能优越。进一步扩展局部一致算子的概念,我们推导出一个算子保持最优性的充分条件,从而形成了一类包括我们的一致贝尔曼算子在内的算子家族。作为推论,我们为Baird的优势学习算法提供了最优性证明,并推导出其他具有有趣性质的动作差距增大的算子。最后,我们在60款Atari 2600游戏中进行了实证研究,展示了这些新算子的强大潜力。

代码仓库

chainer/chainerrl
pytorch
GitHub 中提及
janhuenermann/neurojs
tf
GitHub 中提及

基准测试

基准方法指标
atari-games-on-atari-2600-alienPersistent AL
Score: 5699.81
atari-games-on-atari-2600-alienAdvantage Learning
Score: 4990.91
atari-games-on-atari-2600-amidarPersistent AL
Score: 1451.65
atari-games-on-atari-2600-amidarAdvantage Learning
Score: 1557.43
atari-games-on-atari-2600-assaultAdvantage Learning
Score: 3661.51
atari-games-on-atari-2600-assaultPersistent AL
Score: 3304.33
atari-games-on-atari-2600-asterixPersistent AL
Score: 19564.9
atari-games-on-atari-2600-asterixAdvantage Learning
Score: 12852.08
atari-games-on-atari-2600-asteroidsAdvantage Learning
Score: 1924.42
atari-games-on-atari-2600-asteroidsPersistent AL
Score: 1673.52
atari-games-on-atari-2600-atlantisPersistent AL
Score: 1465250
atari-games-on-atari-2600-atlantisAdvantage Learning
Score: 553591.67
atari-games-on-atari-2600-bank-heistAdvantage Learning
Score: 633.63
atari-games-on-atari-2600-bank-heistPersistent AL
Score: 874.99
atari-games-on-atari-2600-battle-zoneAdvantage Learning
Score: 28789.29
atari-games-on-atari-2600-battle-zonePersistent AL
Score: 34583.07
atari-games-on-atari-2600-beam-riderPersistent AL
Score: 13145.34
atari-games-on-atari-2600-beam-riderAdvantage Learning
Score: 10054.58
atari-games-on-atari-2600-berzerkPersistent AL
Score: 1328.25
atari-games-on-atari-2600-berzerkAdvantage Learning
Score: 747.26
atari-games-on-atari-2600-bowlingAdvantage Learning
Score: 57.41
atari-games-on-atari-2600-bowlingPersistent AL
Score: 71.59
atari-games-on-atari-2600-boxingPersistent AL
Score: 94.3
atari-games-on-atari-2600-boxingAdvantage Learning
Score: 93.94
atari-games-on-atari-2600-breakoutAdvantage Learning
Score: 425.32
atari-games-on-atari-2600-breakoutPersistent AL
Score: 431.89
atari-games-on-atari-2600-centipedePersistent AL
Score: 4539.55
atari-games-on-atari-2600-centipedeAdvantage Learning
Score: 4225.18
atari-games-on-atari-2600-chopper-commandAdvantage Learning
Score: 5431.36
atari-games-on-atari-2600-chopper-commandPersistent AL
Score: 5734.93
atari-games-on-atari-2600-crazy-climberPersistent AL
Score: 130002.71
atari-games-on-atari-2600-crazy-climberAdvantage Learning
Score: 123410.71
atari-games-on-atari-2600-defenderAdvantage Learning
Score: 30643.59
atari-games-on-atari-2600-defenderPersistent AL
Score: 32038.93
atari-games-on-atari-2600-demon-attackPersistent AL
Score: 70908.17
atari-games-on-atari-2600-demon-attackAdvantage Learning
Score: 27153.48
atari-games-on-atari-2600-double-dunkPersistent AL
Score: -2.51
atari-games-on-atari-2600-double-dunkAdvantage Learning
Score: -0.15
atari-games-on-atari-2600-elevator-actionPersistent AL
Score: 29100
atari-games-on-atari-2600-elevator-actionAdvantage Learning
Score: 27088.89
atari-games-on-atari-2600-enduroAdvantage Learning
Score: 1252.7
atari-games-on-atari-2600-enduroPersistent AL
Score: 1343.1
atari-games-on-atari-2600-fishing-derbyAdvantage Learning
Score: 21.32
atari-games-on-atari-2600-fishing-derbyPersistent AL
Score: 28.13
atari-games-on-atari-2600-freewayAdvantage Learning
Score: 31.72
atari-games-on-atari-2600-freewayPersistent AL
Score: 32.3
atari-games-on-atari-2600-frostbitePersistent AL
Score: 3248.96
atari-games-on-atari-2600-frostbiteAdvantage Learning
Score: 2305.82
atari-games-on-atari-2600-gopherPersistent AL
Score: 10611.81
atari-games-on-atari-2600-gopherAdvantage Learning
Score: 11912.68
atari-games-on-atari-2600-gravitarPersistent AL
Score: 446.92
atari-games-on-atari-2600-gravitarAdvantage Learning
Score: 417.65
atari-games-on-atari-2600-heroPersistent AL
Score: 24175.79
atari-games-on-atari-2600-heroAdvantage Learning
Score: 24788.86
atari-games-on-atari-2600-ice-hockeyAdvantage Learning
Score: -1.24
atari-games-on-atari-2600-ice-hockeyPersistent AL
Score: -0.25
atari-games-on-atari-2600-james-bondPersistent AL
Score: 772.09
atari-games-on-atari-2600-james-bondAdvantage Learning
Score: 848.46
atari-games-on-atari-2600-kangarooAdvantage Learning
Score: 10809.16
atari-games-on-atari-2600-kangarooPersistent AL
Score: 11478.46
atari-games-on-atari-2600-krullPersistent AL
Score: 8689.81
atari-games-on-atari-2600-krullAdvantage Learning
Score: 9548.92
atari-games-on-atari-2600-kung-fu-masterAdvantage Learning
Score: 32182.99
atari-games-on-atari-2600-kung-fu-masterPersistent AL
Score: 34650.91
atari-games-on-atari-2600-montezumas-revengeAdvantage Learning
Score: 0.42
atari-games-on-atari-2600-montezumas-revengePersistent AL
Score: 1.72
atari-games-on-atari-2600-ms-pacmanPersistent AL
Score: 3917.55
atari-games-on-atari-2600-ms-pacmanAdvantage Learning
Score: 4065.8
atari-games-on-atari-2600-name-this-gamePersistent AL
Score: 10431.33
atari-games-on-atari-2600-name-this-gameAdvantage Learning
Score: 11025.26
atari-games-on-atari-2600-phoenixAdvantage Learning
Score: 22038.27
atari-games-on-atari-2600-phoenixPersistent AL
Score: 14495.56
atari-games-on-atari-2600-pitfallAdvantage Learning
Score: 0
atari-games-on-atari-2600-pongAdvantage Learning
Score: 19.66
atari-games-on-atari-2600-pongPersistent AL
Score: 19.76
atari-games-on-atari-2600-pooyanAdvantage Learning
Score: 4801.27
atari-games-on-atari-2600-private-eyeAdvantage Learning
Score: 5276.16
atari-games-on-atari-2600-qbertAdvantage Learning
Score: 14368.03
atari-games-on-atari-2600-river-raidAdvantage Learning
Score: 10585.12
atari-games-on-atari-2600-road-runnerAdvantage Learning
Score: 52351.23
atari-games-on-atari-2600-robotankAdvantage Learning
Score: 69.31
atari-games-on-atari-2600-seaquestAdvantage Learning
Score: 8670.5
atari-games-on-atari-2600-seaquestPersistent AL
Score: 13230.74
atari-games-on-atari-2600-skiingAdvantage Learning
Score: -13264.51
atari-games-on-atari-2600-solarisAdvantage Learning
Score: 4785.16
atari-games-on-atari-2600-space-invadersAdvantage Learning
Score: 3460.79
atari-games-on-atari-2600-space-invadersPersistent AL
Score: 3277.59
atari-games-on-atari-2600-star-gunnerAdvantage Learning
Score: 61353.59
atari-games-on-atari-2600-surroundPersistent AL
Score: 0.72
atari-games-on-atari-2600-tennisAdvantage Learning
Score: 0
atari-games-on-atari-2600-time-pilotAdvantage Learning
Score: 8969.12
atari-games-on-atari-2600-tutankhamAdvantage Learning
Score: 245.22
atari-games-on-atari-2600-up-and-downAdvantage Learning
Score: 13909.74
atari-games-on-atari-2600-ventureAdvantage Learning
Score: 198.69
atari-games-on-atari-2600-video-pinballAdvantage Learning
Score: 543504
atari-games-on-atari-2600-wizard-of-worAdvantage Learning
Score: 9541.14
atari-games-on-atari-2600-yars-revengeAdvantage Learning
Score: 24240.03
atari-games-on-atari-2600-zaxxonAdvantage Learning
Score: 9129.61

用 AI 构建 AI

从想法到上线——通过免费 AI 协同编程、开箱即用的环境和市场最优价格的 GPU 加速您的 AI 开发

AI 协同编程
即用型 GPU
最优价格
立即开始

Hyper Newsletters

订阅我们的最新资讯
我们会在北京时间 每周一的上午九点 向您的邮箱投递本周内的最新更新
邮件发送服务由 MailChimp 提供
增大行动差距:强化学习的新算子 | 论文 | HyperAI超神经