4 个月前

自我模仿学习

自我模仿学习

摘要

本文提出了一种自模仿学习(Self-Imitation Learning, SIL)算法,这是一种简单的离策略演员-评论家算法,旨在学习重现代理过去的良好决策。该算法设计用于验证我们的假设,即利用过去的良好经验可以间接促进深度探索。实验结果表明,SIL在多个难度较高的Atari游戏中显著提升了优势演员-评论家(Advantage Actor-Critic, A2C)算法的性能,并且在探索方法上与当前最先进的基于计数的方法具有竞争力。此外,我们还证明了SIL在MuJoCo任务中提高了近端策略优化(Proximal Policy Optimization, PPO)算法的性能。

代码仓库

rwightman/pytorch-opensim-rl
pytorch
GitHub 中提及
alhabk/SGEE--pytorch
pytorch
GitHub 中提及
junhyukoh/self-imitation-learning
官方
tf
GitHub 中提及

基准测试

基准方法指标
atari-games-on-atari-2600-alienA2C + SIL
Score: 2242.2
atari-games-on-atari-2600-amidarA2C + SIL
Score: 1362
atari-games-on-atari-2600-assaultA2C + SIL
Score: 1812
atari-games-on-atari-2600-asterixA2C + SIL
Score: 17984.2
atari-games-on-atari-2600-asteroidsA2C + SIL
Score: 2259.4
atari-games-on-atari-2600-atlantisA2C + SIL
Score: 3084781.7
atari-games-on-atari-2600-bank-heistA2C + SIL
Score: 1137.8
atari-games-on-atari-2600-battle-zoneA2C + SIL
Score: 25075
atari-games-on-atari-2600-beam-riderA2C + SIL
Score: 2366.2
atari-games-on-atari-2600-bowlingA2C + SIL
Score: 31.1
atari-games-on-atari-2600-boxingA2C + SIL
Score: 99.6
atari-games-on-atari-2600-breakoutA2C + SIL
Score: 452
atari-games-on-atari-2600-centipedeA2C + SIL
Score: 7559.5
atari-games-on-atari-2600-chopper-commandA2C + SIL
Score: 6710
atari-games-on-atari-2600-crazy-climberA2C + SIL
Score: 130185.8
atari-games-on-atari-2600-demon-attackA2C + SIL
Score: 10140.5
atari-games-on-atari-2600-double-dunkA2C + SIL
Score: 21.5
atari-games-on-atari-2600-enduroA2C + SIL
Score: 1205.1
atari-games-on-atari-2600-fishing-derbyA2C + SIL
Score: 55.8
atari-games-on-atari-2600-freewayA2C + SIL
Score: 32.2
atari-games-on-atari-2600-frostbiteA2C + SIL
Score: 6289.8
atari-games-on-atari-2600-gopherA2C + SIL
Score: 23304.2
atari-games-on-atari-2600-gravitarA2C + SIL
Score: 1874.2
atari-games-on-atari-2600-heroA2C + SIL
Score: 33156.7
atari-games-on-atari-2600-ice-hockeyA2C + SIL
Score: -2.4
atari-games-on-atari-2600-james-bondA2C + SIL
Score: 310.8
atari-games-on-atari-2600-kangarooA2C + SIL
Score: 2888.3
atari-games-on-atari-2600-krullA2C + SIL
Score: 10614.6
atari-games-on-atari-2600-kung-fu-masterA2C + SIL
Score: 34449.2
atari-games-on-atari-2600-montezumas-revengeA2C + SIL
Score: 1100
atari-games-on-atari-2600-ms-pacmanA2C + SIL
Score: 4025.1
atari-games-on-atari-2600-name-this-gameA2C + SIL
Score: 14958.2
atari-games-on-atari-2600-pongA2C + SIL
Score: 20.9
atari-games-on-atari-2600-private-eyeA2C + SIL
Score: 661.2
atari-games-on-atari-2600-qbertA2C + SIL
Score: 104975.6
atari-games-on-atari-2600-river-raidA2C + SIL
Score: 14306.1
atari-games-on-atari-2600-road-runnerA2C + SIL
Score: 57071.7
atari-games-on-atari-2600-robotankA2C + SIL
Score: 10.5
atari-games-on-atari-2600-seaquestA2C + SIL
Score: 2456.5
atari-games-on-atari-2600-space-invadersA2C + SIL
Score: 2951.7
atari-games-on-atari-2600-star-gunnerA2C + SIL
Score: 31309.2
atari-games-on-atari-2600-tennisA2C + SIL
Score: -17.3
atari-games-on-atari-2600-time-pilotA2C + SIL
Score: 10811.7
atari-games-on-atari-2600-tutankhamA2C + SIL
Score: 340.5
atari-games-on-atari-2600-up-and-downA2C + SIL
Score: 53314.6
atari-games-on-atari-2600-ventureA2C + SIL
Score: 0
atari-games-on-atari-2600-video-pinballA2C + SIL
Score: 461522.4
atari-games-on-atari-2600-wizard-of-worA2C + SIL
Score: 7088.3
atari-games-on-atari-2600-zaxxonA2C + SIL
Score: 9164.2

用 AI 构建 AI

从想法到上线——通过免费 AI 协同编程、开箱即用的环境和市场最优价格的 GPU 加速您的 AI 开发

AI 协同编程
即用型 GPU
最优价格
立即开始

Hyper Newsletters

订阅我们的最新资讯
我们会在北京时间 每周一的上午九点 向您的邮箱投递本周内的最新更新
邮件发送服务由 MailChimp 提供
自我模仿学习 | 论文 | HyperAI超神经