4 个月前

进化策略作为强化学习的可扩展替代方案

进化策略作为强化学习的可扩展替代方案

摘要

我们探讨了进化策略(Evolution Strategies, ES)这一类黑盒优化算法作为基于马尔可夫决策过程(MDP)的强化学习(RL)技术(如Q学习和策略梯度)的替代方案。在MuJoCo和Atari上的实验表明,ES是一种可行的解决方案策略,其性能随着可用CPU数量的增加而显著提升:通过采用一种基于公共随机数的新颖通信策略,我们的ES实现仅需传输标量数据,从而可以扩展到超过一千个并行工作节点。这使得我们在10分钟内解决了3D人体行走问题,并在经过一小时训练后,在大多数Atari游戏中取得了具有竞争力的结果。此外,我们还强调了ES作为黑盒优化技术的几个优势:它对动作频率和延迟奖励具有不变性,能够容忍极长的时间范围,并且不需要时间折现或价值函数近似。

基准测试

基准方法指标
atari-games-on-atari-2600-alienES FF (1 hour) noop
Score: 994.0
atari-games-on-atari-2600-amidarES FF (1 hour) noop
Score: 112.0
atari-games-on-atari-2600-assaultES FF (1 hour) noop
Score: 1673.9
atari-games-on-atari-2600-asterixES FF (1 hour) noop
Score: 1440
atari-games-on-atari-2600-asteroidsES FF (1 hour) noop
Score: 1562.0
atari-games-on-atari-2600-atlantisES FF (1 hour) noop
Score: 1267410.0
atari-games-on-atari-2600-bank-heistES FF (1 hour) noop
Score: 225.0
atari-games-on-atari-2600-battle-zoneES FF (1 hour) noop
Score: 16600.0
atari-games-on-atari-2600-beam-riderES FF (1 hour) noop
Score: 744.0
atari-games-on-atari-2600-berzerkES FF (1 hour) noop
Score: 686.0
atari-games-on-atari-2600-bowlingES FF (1 hour) noop
Score: 30
atari-games-on-atari-2600-boxingES FF (1 hour) noop
Score: 49.8
atari-games-on-atari-2600-breakoutES FF (1 hour) noop
Score: 9.5
atari-games-on-atari-2600-centipedeES FF (1 hour) noop
Score: 7783.9
atari-games-on-atari-2600-chopper-commandES FF (1 hour) noop
Score: 3710.0
atari-games-on-atari-2600-crazy-climberES FF (1 hour) noop
Score: 26430.0
atari-games-on-atari-2600-demon-attackES FF (1 hour) noop
Score: 1166.5
atari-games-on-atari-2600-double-dunkES FF (1 hour) noop
Score: 0.2
atari-games-on-atari-2600-enduroES FF (1 hour) noop
Score: 95.0
atari-games-on-atari-2600-fishing-derbyES FF (1 hour) noop
Score: -49.0
atari-games-on-atari-2600-freewayES FF (1 hour) noop
Score: 31.0
atari-games-on-atari-2600-frostbiteES FF (1 hour) noop
Score: 370.0
atari-games-on-atari-2600-gopherES FF (1 hour) noop
Score: 582.0
atari-games-on-atari-2600-gravitarES FF (1 hour) noop
Score: 805.0
atari-games-on-atari-2600-ice-hockeyES FF (1 hour) noop
Score: -4.1
atari-games-on-atari-2600-kangarooES FF (1 hour) noop
Score: 11200.0
atari-games-on-atari-2600-krullES FF (1 hour) noop
Score: 8647.2
atari-games-on-atari-2600-name-this-gameES FF (1 hour) noop
Score: 4503.0
atari-games-on-atari-2600-pongES FF (1 hour) noop
Score: 21.0
atari-games-on-atari-2600-private-eyeES FF (1 hour) noop
Score: 100.0
atari-games-on-atari-2600-qbertES FF (1 hour) noop
Score: 147.5
atari-games-on-atari-2600-river-raidES FF (1 hour) noop
Score: 5009.0
atari-games-on-atari-2600-road-runnerES FF (1 hour) noop
Score: 16590.0
atari-games-on-atari-2600-robotankES FF (1 hour) noop
Score: 11.9
atari-games-on-atari-2600-seaquestES FF (1 hour) noop
Score: 1390.0
atari-games-on-atari-2600-space-invadersES FF (1 hour) noop
Score: 678.5
atari-games-on-atari-2600-star-gunnerES FF (1 hour) noop
Score: 1470.0
atari-games-on-atari-2600-tennisES FF (1 hour) noop
Score: -4.5
atari-games-on-atari-2600-time-pilotES FF (1 hour) noop
Score: 4970.0
atari-games-on-atari-2600-tutankhamES FF (1 hour) noop
Score: 130.3
atari-games-on-atari-2600-up-and-downES FF (1 hour) noop
Score: 67974.0
atari-games-on-atari-2600-ventureES FF (1 hour) noop
Score: 760.0
atari-games-on-atari-2600-video-pinballES FF (1 hour) noop
Score: 22834.8
atari-games-on-atari-2600-wizard-of-worES FF (1 hour) noop
Score: 3480.0
atari-games-on-atari-2600-zaxxonES FF (1 hour) noop
Score: 6380.0

用 AI 构建 AI

从想法到上线——通过免费 AI 协同编程、开箱即用的环境和市场最优价格的 GPU 加速您的 AI 开发

AI 协同编程
即用型 GPU
最优价格
立即开始

Hyper Newsletters

订阅我们的最新资讯
我们会在北京时间 每周一的上午九点 向您的邮箱投递本周内的最新更新
邮件发送服务由 MailChimp 提供
进化策略作为强化学习的可扩展替代方案 | 论文 | HyperAI超神经