3 个月前

Agent57:超越雅达利人类基准

Agent57:超越雅达利人类基准

摘要

过去十年中,Atari游戏一直是强化学习(RL)领域长期使用的基准测试集,旨在评估强化学习算法的通用能力。以往的研究虽在该测试集中的多数游戏中表现优异,但对其中最具有挑战性的几款游戏表现却极为不佳。为此,我们提出了Agent57——首个在全部57款Atari游戏中均超越人类标准表现的深度强化学习智能体。为实现这一突破,我们训练了一个神经网络,该网络参数化了一组从高度探索性到完全利用性(exploitative)的策略集合。我们进一步提出了一种自适应机制,用于在训练过程中动态选择应优先采用的策略。此外,我们还引入了一种新颖的网络架构参数化方法,显著提升了学习过程的一致性与稳定性。

代码仓库

yuta0821/agent57_pytorch
pytorch
GitHub 中提及
michaelnny/deep_rl_zoo
pytorch
GitHub 中提及
YHL04/agent57
pytorch
GitHub 中提及

基准测试

基准方法指标
atari-games-on-atari-2600-alienAgent57
Score: 297638.17
atari-games-on-atari-2600-amidarAgent57
Score: 29660.08
atari-games-on-atari-2600-assaultAgent57
Score: 67212.67
atari-games-on-atari-2600-asterixAgent57
Score: 991384.42
atari-games-on-atari-2600-asteroidsAgent57
Score: 150854.61
atari-games-on-atari-2600-atlantisAgent57
Score: 1528841.76
atari-games-on-atari-2600-bank-heistAgent57
Score: 23071.5
atari-games-on-atari-2600-battle-zoneAgent57
Score: 934134.88
atari-games-on-atari-2600-beam-riderAgent57
Score: 300509.8
atari-games-on-atari-2600-berzerkAgent57
Score: 61507.83
atari-games-on-atari-2600-bowlingAgent57
Score: 251.18
atari-games-on-atari-2600-boxingAgent57
Score: 100
atari-games-on-atari-2600-breakoutAgent57
Score: 790.4
atari-games-on-atari-2600-centipedeAgent57
Score: 412847.86
atari-games-on-atari-2600-chopper-commandAgent57
Score: 999900
atari-games-on-atari-2600-crazy-climberAgent57
Score: 565909.85
atari-games-on-atari-2600-defenderAgent57
Score: 677642.78
atari-games-on-atari-2600-demon-attackAgent57
Score: 143161.44
atari-games-on-atari-2600-double-dunkAgent57
Score: 23.93
atari-games-on-atari-2600-enduroAgent57
Score: 2367.71
atari-games-on-atari-2600-fishing-derbyAgent57
Score: 86.97
atari-games-on-atari-2600-freewayAgent57
Score: 32.59
atari-games-on-atari-2600-frostbiteAgent57
Score: 541280.88
atari-games-on-atari-2600-gopherAgent57
Score: 117777.08
atari-games-on-atari-2600-gravitarAgent57
Score: 19213.96
atari-games-on-atari-2600-heroAgent57
Score: 114736.26
atari-games-on-atari-2600-ice-hockeyAgent57
Score: 63.64
atari-games-on-atari-2600-james-bondAgent57
Score: 135784.96
atari-games-on-atari-2600-kangarooAgent57
Score: 24034.16
atari-games-on-atari-2600-krullAgent57
Score: 251997.31
atari-games-on-atari-2600-kung-fu-masterAgent57
Score: 206845.82
atari-games-on-atari-2600-montezumas-revengeAgent57
Score: 9352.01
atari-games-on-atari-2600-ms-pacmanAgent57
Score: 63994.44
atari-games-on-atari-2600-name-this-gameAgent57
Score: 54386.77
atari-games-on-atari-2600-phoenixAgent57
Score: 908264.15
atari-games-on-atari-2600-pitfallAgent57
Score: 18756.01
atari-games-on-atari-2600-pongAgent57
Score: 20.67
atari-games-on-atari-2600-private-eyeAgent57
Score: 79716.46
atari-games-on-atari-2600-qbertAgent57
Score: 580328.14
atari-games-on-atari-2600-river-raidAgent57
Score: 63318.67
atari-games-on-atari-2600-road-runnerAgent57
Score: 243025.8
atari-games-on-atari-2600-robotankAgent57
Score: 127.32
atari-games-on-atari-2600-seaquestAgent57
Score: 999997.63
atari-games-on-atari-2600-skiingAgent57
Score: -4202.6
atari-games-on-atari-2600-solarisAgent57
Score: 44199.93
atari-games-on-atari-2600-space-invadersAgent57
Score: 48680.86
atari-games-on-atari-2600-star-gunnerAgent57
Score: 839573.53
atari-games-on-atari-2600-surroundAgent57
Score: 9.5
atari-games-on-atari-2600-tennisAgent57
Score: 23.84
atari-games-on-atari-2600-time-pilotAgent57
Score: 405425.31
atari-games-on-atari-2600-tutankhamAgent57
Score: 2354.91
atari-games-on-atari-2600-up-and-downAgent57
Score: 623805.73
atari-games-on-atari-2600-ventureAgent57
Score: 2623.71
atari-games-on-atari-2600-video-pinballAgent57
Score: 992340.74
atari-games-on-atari-2600-wizard-of-worAgent57
Score: 157306.41
atari-games-on-atari-2600-yars-revengeAgent57
Score: 998532.37
atari-games-on-atari-2600-zaxxonAgent57
Score: 249808.9
atari-games-on-atari-gameAgent57
Human World Record Breakthrough: 18
atari-games-on-atari-gamesAgent57
Mean Human Normalized Score: 4763.69%

用 AI 构建 AI

从想法到上线——通过免费 AI 协同编程、开箱即用的环境和市场最优价格的 GPU 加速您的 AI 开发

AI 协同编程
即用型 GPU
最优价格
立即开始

Hyper Newsletters

订阅我们的最新资讯
我们会在北京时间 每周一的上午九点 向您的邮箱投递本周内的最新更新
邮件发送服务由 MailChimp 提供
Agent57:超越雅达利人类基准 | 论文 | HyperAI超神经