4 个月前

强化学习的分布视角

强化学习的分布视角

摘要

本文论述了价值分布(value distribution)的基本重要性:即强化学习代理所接收的随机回报的分布。这与常见的强化学习方法形成了对比,后者通常建模该回报的期望值,或称为价值。尽管已有大量文献研究价值分布,但迄今为止,这些研究总是将其用于特定目的,例如实现风险意识行为。我们首先在策略评估和控制设置中提供了理论结果,揭示了后者存在显著的分布不稳定问题。然后,我们利用分布视角设计了一种新算法,该算法将贝尔曼方程应用于近似价值分布的学习。我们使用来自Arcade Learning Environment的游戏套件对我们的算法进行了评估。实验结果不仅达到了当前最佳水平,而且通过具体案例证明了在近似强化学习中价值分布的重要性。最后,我们将理论和实证证据结合起来,强调价值分布在近似设置下对学习过程的影响方式。

代码仓库

基准测试

基准方法指标
atari-games-on-atari-2600-alienC51 noop
Score: 3166.0
atari-games-on-atari-2600-amidarC51 noop
Score: 1735.0
atari-games-on-atari-2600-assaultC51 noop
Score: 7203.0
atari-games-on-atari-2600-asterixC51 noop
Score: 406211
atari-games-on-atari-2600-asteroidsC51 noop
Score: 1516.0
atari-games-on-atari-2600-atlantisC51 noop
Score: 841075.0
atari-games-on-atari-2600-bank-heistC51 noop
Score: 976.0
atari-games-on-atari-2600-battle-zoneC51 noop
Score: 28742.0
atari-games-on-atari-2600-beam-riderC51 noop
Score: 14074.0
atari-games-on-atari-2600-berzerkC51 noop
Score: 1645.0
atari-games-on-atari-2600-bowlingC51 noop
Score: 81.8
atari-games-on-atari-2600-boxingC51 noop
Score: 97.8
atari-games-on-atari-2600-breakoutC51 noop
Score: 748.0
atari-games-on-atari-2600-centipedeC51 noop
Score: 9646.0
atari-games-on-atari-2600-chopper-commandC51 noop
Score: 15600.0
atari-games-on-atari-2600-crazy-climberC51 noop
Score: 179877.0
atari-games-on-atari-2600-demon-attackC51 noop
Score: 130955.0
atari-games-on-atari-2600-double-dunkC51 noop
Score: 2.5
atari-games-on-atari-2600-enduroC51 noop
Score: 3454.0
atari-games-on-atari-2600-fishing-derbyC51 noop
Score: 8.9
atari-games-on-atari-2600-freewayC51 noop
Score: 33.9
atari-games-on-atari-2600-frostbiteC51 noop
Score: 3965.0
atari-games-on-atari-2600-gopherC51 noop
Score: 33641.0
atari-games-on-atari-2600-gravitarC51 noop
Score: 440.0
atari-games-on-atari-2600-heroC51 noop
Score: 38874
atari-games-on-atari-2600-ice-hockeyC51 noop
Score: -3.5
atari-games-on-atari-2600-james-bondC51 noop
Score: 1909.0
atari-games-on-atari-2600-kangarooC51 noop
Score: 12853.0
atari-games-on-atari-2600-krullC51 noop
Score: 9735.0
atari-games-on-atari-2600-kung-fu-masterC51 noop
Score: 48192.0
atari-games-on-atari-2600-ms-pacmanC51 noop
Score: 3415.0
atari-games-on-atari-2600-name-this-gameC51 noop
Score: 12542.0
atari-games-on-atari-2600-pongC51 noop
Score: 20.9
atari-games-on-atari-2600-private-eyeC51 noop
Score: 15095.0
atari-games-on-atari-2600-qbertC51 noop
Score: 23784
atari-games-on-atari-2600-river-raidC51 noop
Score: 17322.0
atari-games-on-atari-2600-road-runnerC51 noop
Score: 55839.0
atari-games-on-atari-2600-robotankC51 noop
Score: 52.3
atari-games-on-atari-2600-seaquestC51 noop
Score: 266434.0
atari-games-on-atari-2600-space-invadersC51 noop
Score: 5747.0
atari-games-on-atari-2600-star-gunnerC51 noop
Score: 49095.0
atari-games-on-atari-2600-tennisC51 noop
Score: 23.1
atari-games-on-atari-2600-time-pilotC51 noop
Score: 8329.0
atari-games-on-atari-2600-tutankhamC51 noop
Score: 280.0
atari-games-on-atari-2600-up-and-downC51 noop
Score: 15612.0
atari-games-on-atari-2600-ventureC51 noop
Score: 1520.0
atari-games-on-atari-2600-video-pinballC51 noop
Score: 949604.0
atari-games-on-atari-2600-wizard-of-worC51 noop
Score: 9300.0
atari-games-on-atari-2600-zaxxonC51 noop
Score: 10513.0

用 AI 构建 AI

从想法到上线——通过免费 AI 协同编程、开箱即用的环境和市场最优价格的 GPU 加速您的 AI 开发

AI 协同编程
即用型 GPU
最优价格
立即开始

Hyper Newsletters

订阅我们的最新资讯
我们会在北京时间 每周一的上午九点 向您的邮箱投递本周内的最新更新
邮件发送服务由 MailChimp 提供
强化学习的分布视角 | 论文 | HyperAI超神经