4 个月前

在多个数量级上学习值

在多个数量级上学习值

摘要

大多数学习算法对所逼近函数的尺度并不具有不变性。我们提出了一种自适应归一化目标值的方法,这在基于价值的强化学习中尤为有用。在基于价值的强化学习中,随着行为策略的更新,合适的值逼近的量级可能会随时间发生变化。我们的主要动机来自于之前关于学习玩Atari游戏的研究,其中所有奖励都被裁剪到一个预设范围内。这种裁剪有助于使用单一学习算法跨多个不同游戏进行学习,但被裁剪的奖励函数可能导致定性不同的行为表现。通过使用自适应归一化方法,我们可以移除这一特定领域的启发式方法而不降低整体性能。

基准测试

基准方法指标
atari-games-on-atari-2600-alienDDQN+Pop-Art noop
Score: 3213.5
atari-games-on-atari-2600-amidarDDQN+Pop-Art noop
Score: 782.5
atari-games-on-atari-2600-assaultDDQN+Pop-Art noop
Score: 9011.6
atari-games-on-atari-2600-asterixDDQN+Pop-Art noop
Score: 18919.5
atari-games-on-atari-2600-asteroidsDDQN+Pop-Art noop
Score: 2869.3
atari-games-on-atari-2600-atlantisDDQN+Pop-Art noop
Score: 340076.0
atari-games-on-atari-2600-bank-heistDDQN+Pop-Art noop
Score: 1103.3
atari-games-on-atari-2600-battle-zoneDDQN+Pop-Art noop
Score: 8220.0
atari-games-on-atari-2600-beam-riderDDQN+Pop-Art noop
Score: 8299.4
atari-games-on-atari-2600-berzerkDDQN+Pop-Art noop
Score: 1199.6
atari-games-on-atari-2600-bowlingDDQN+Pop-Art noop
Score: 102.1
atari-games-on-atari-2600-boxingDDQN+Pop-Art noop
Score: 99.3
atari-games-on-atari-2600-breakoutDDQN+Pop-Art noop
Score: 344.1
atari-games-on-atari-2600-centipedeDDQN+Pop-Art noop
Score: 49065.8
atari-games-on-atari-2600-chopper-commandDDQN+Pop-Art noop
Score: 775.0
atari-games-on-atari-2600-crazy-climberDDQN+Pop-Art noop
Score: 119679.0
atari-games-on-atari-2600-demon-attackDDQN+Pop-Art noop
Score: 63644.9
atari-games-on-atari-2600-double-dunkDDQN+Pop-Art noop
Score: -11.5
atari-games-on-atari-2600-enduroDDQN+Pop-Art noop
Score: 2002.1
atari-games-on-atari-2600-fishing-derbyDDQN+Pop-Art noop
Score: 45.1
atari-games-on-atari-2600-freewayDDQN+Pop-Art noop
Score: 33.4
atari-games-on-atari-2600-frostbiteDDQN+Pop-Art noop
Score: 3469.6
atari-games-on-atari-2600-gopherDDQN+Pop-Art noop
Score: 56218.2
atari-games-on-atari-2600-gravitarDDQN+Pop-Art noop
Score: 483.5
atari-games-on-atari-2600-heroDDQN+Pop-Art noop
Score: 14225.2
atari-games-on-atari-2600-ice-hockeyDDQN+Pop-Art noop
Score: -4.1
atari-games-on-atari-2600-james-bondDDQN+Pop-Art noop
Score: 507.5
atari-games-on-atari-2600-kangarooDDQN+Pop-Art noop
Score: 13150.0
atari-games-on-atari-2600-krullDDQN+Pop-Art noop
Score: 9745.1
atari-games-on-atari-2600-kung-fu-masterDDQN+Pop-Art noop
Score: 34393.0
atari-games-on-atari-2600-ms-pacmanDDQN+Pop-Art noop
Score: 4963.8
atari-games-on-atari-2600-name-this-gameDDQN+Pop-Art noop
Score: 15851.2
atari-games-on-atari-2600-pongDDQN+Pop-Art noop
Score: 20.6
atari-games-on-atari-2600-private-eyeDDQN+Pop-Art noop
Score: 286.7
atari-games-on-atari-2600-qbertDDQN+Pop-Art noop
Score: 5236.8
atari-games-on-atari-2600-river-raidDDQN+Pop-Art noop
Score: 12530.8
atari-games-on-atari-2600-road-runnerDDQN+Pop-Art noop
Score: 47770.0
atari-games-on-atari-2600-robotankDDQN+Pop-Art noop
Score: 64.3
atari-games-on-atari-2600-seaquestDDQN+Pop-Art noop
Score: 10932.3
atari-games-on-atari-2600-space-invadersDDQN+Pop-Art noop
Score: 2589.7
atari-games-on-atari-2600-star-gunnerDDQN+Pop-Art noop
Score: 589.0
atari-games-on-atari-2600-tennisDDQN+Pop-Art noop
Score: 12.1
atari-games-on-atari-2600-time-pilotDDQN+Pop-Art noop
Score: 4870.0
atari-games-on-atari-2600-tutankhamDDQN+Pop-Art noop
Score: 183.9
atari-games-on-atari-2600-up-and-downDDQN+Pop-Art noop
Score: 22474.4
atari-games-on-atari-2600-ventureDDQN+Pop-Art noop
Score: 1172.0
atari-games-on-atari-2600-video-pinballDDQN+Pop-Art noop
Score: 56287.0
atari-games-on-atari-2600-wizard-of-worDDQN+Pop-Art noop
Score: 483.0
atari-games-on-atari-2600-zaxxonDDQN+Pop-Art noop
Score: 14402.0

用 AI 构建 AI

从想法到上线——通过免费 AI 协同编程、开箱即用的环境和市场最优价格的 GPU 加速您的 AI 开发

AI 协同编程
即用型 GPU
最优价格
立即开始

Hyper Newsletters

订阅我们的最新资讯
我们会在北京时间 每周一的上午九点 向您的邮箱投递本周内的最新更新
邮件发送服务由 MailChimp 提供
在多个数量级上学习值 | 论文 | HyperAI超神经