3 个月前

使用离散世界模型掌握Atari游戏

使用离散世界模型掌握Atari游戏

摘要

智能代理需要从过往经验中进行泛化,以在复杂环境中达成目标。世界模型有助于实现这种泛化,并通过从想象的后果中学习行为,提升样本效率。尽管近年来基于图像输入学习世界模型在某些任务中已变得可行,但要构建足够精确的Atari游戏世界模型以推导出有效行为,长期以来仍是未解难题。本文提出DreamerV2,一种仅通过强大世界模型的紧凑隐空间中的预测来学习行为的强化学习代理。该世界模型采用离散表示,并与策略网络独立训练。DreamerV2是首个在55项Atari基准任务上实现人类水平性能的智能体,其行为学习完全基于一个独立训练的世界模型。在相同的计算预算和实际运行时间下,DreamerV2可达到2亿帧的训练量,其最终性能超越了当前单GPU顶尖代理IQN与Rainbow。此外,DreamerV2还可应用于连续动作任务,能够从仅像素输入中学习到一个复杂人形机器人的高精度世界模型,并成功实现站立与行走等复杂行为。

代码仓库

CVC-Lab/SAC-for-H-Bond-Learning
pytorch
GitHub 中提及
LukeBolly/dreamerv2
tf
GitHub 中提及
adityabingi/Dreamer
pytorch
GitHub 中提及
andrejorsula/drl_grasping
pytorch
GitHub 中提及
danijar/dreamerv2
官方
tf
GitHub 中提及
chandar-lab/LoCA2
tf
GitHub 中提及
RajGhugare19/dreamerv2
pytorch
GitHub 中提及

基准测试

基准方法指标
atari-games-on-atari-2600-alienDreamerV2
Score: 3967
atari-games-on-atari-2600-amidarDreamerV2
Score: 2577
atari-games-on-atari-2600-assaultDreamerV2
Score: 23625
atari-games-on-atari-2600-asterixDreamerV2
Score: 72311
atari-games-on-atari-2600-asteroidsDreamerV2
Score: 41526
atari-games-on-atari-2600-atlantisDreamerV2
Score: 978778
atari-games-on-atari-2600-bank-heistDreamerV2
Score: 1126
atari-games-on-atari-2600-battle-zoneDreamerV2
Score: 40325
atari-games-on-atari-2600-beam-riderDreamerV2
Score: 18646
atari-games-on-atari-2600-berzerkDreamerV2
Score: 810
atari-games-on-atari-2600-bowlingDreamerV2
Score: 49
atari-games-on-atari-2600-boxingDreamerV2
Score: 92
atari-games-on-atari-2600-breakoutDreamerV2
Score: 312
atari-games-on-atari-2600-centipedeDreamerV2
Score: 11883
atari-games-on-atari-2600-chopper-commandDreamerV2
Score: 2861
atari-games-on-atari-2600-crazy-climberDreamerV2
Score: 161839
atari-games-on-atari-2600-demon-attackDreamerV2
Score: 82263
atari-games-on-atari-2600-double-dunkDreamerV2
Score: 17
atari-games-on-atari-2600-enduroDreamerV2
Score: 1656
atari-games-on-atari-2600-fishing-derbyDreamerV2
Score: 65
atari-games-on-atari-2600-freewayDreamerV2
Score: 33
atari-games-on-atari-2600-frostbiteDreamerV2
Score: 11384
atari-games-on-atari-2600-gopherDreamerV2
Score: 92282
atari-games-on-atari-2600-gravitarDreamerV2
Score: 3789
atari-games-on-atari-2600-heroDreamerV2
Score: 21868
atari-games-on-atari-2600-ice-hockeyDreamerV2
Score: 26
atari-games-on-atari-2600-james-bondDreamerV2
Score: 40445
atari-games-on-atari-2600-kangarooDreamerV2
Score: 14064
atari-games-on-atari-2600-krullDreamerV2
Score: 50061
atari-games-on-atari-2600-kung-fu-masterDreamerV2
Score: 62741
atari-games-on-atari-2600-montezumas-revengeDreamerV2
Score: 81
atari-games-on-atari-2600-ms-pacmanDreamerV2
Score: 5652
atari-games-on-atari-2600-name-this-gameDreamerV2
Score: 14649
atari-games-on-atari-2600-phoenixDreamerV2
Score: 49375
atari-games-on-atari-2600-pitfallDreamerV2
Score: 0
atari-games-on-atari-2600-pongDreamerV2
Score: 20
atari-games-on-atari-2600-private-eyeDreamerV2
Score: 2198
atari-games-on-atari-2600-qbertDreamerV2
Score: 94688
atari-games-on-atari-2600-river-raidDreamerV2
Score: 16351
atari-games-on-atari-2600-road-runnerDreamerV2
Score: 203576
atari-games-on-atari-2600-robotankDreamerV2
Score: 78
atari-games-on-atari-2600-seaquestDreamerV2
Score: 7480
atari-games-on-atari-2600-skiingDreamerV2
Score: -9299
atari-games-on-atari-2600-solarisDreamerV2
Score: 922
atari-games-on-atari-2600-space-invadersDreamerV2
Score: 2474
atari-games-on-atari-2600-star-gunnerDreamerV2
Score: 7800
atari-games-on-atari-2600-tennisDreamerV2
Score: 14
atari-games-on-atari-2600-time-pilotDreamerV2
Score: 37945
atari-games-on-atari-2600-tutankhamDreamerV2
Score: 264
atari-games-on-atari-2600-up-and-downDreamerV2
Score: 653662
atari-games-on-atari-2600-ventureDreamerV2
Score: 2
atari-games-on-atari-2600-video-pinballDreamerV2
Score: 41860
atari-games-on-atari-2600-wizard-of-worDreamerV2
Score: 12851
atari-games-on-atari-2600-yars-revengeDreamerV2
Score: 156748
atari-games-on-atari-2600-zaxxonDreamerV2
Score: 50699
atari-games-on-atari-gamesDreamerV2
Mean Human Normalized Score: 631.17%

用 AI 构建 AI

从想法到上线——通过免费 AI 协同编程、开箱即用的环境和市场最优价格的 GPU 加速您的 AI 开发

AI 协同编程
即用型 GPU
最优价格
立即开始

Hyper Newsletters

订阅我们的最新资讯
我们会在北京时间 每周一的上午九点 向您的邮箱投递本周内的最新更新
邮件发送服务由 MailChimp 提供
使用离散世界模型掌握Atari游戏 | 论文 | HyperAI超神经