3 个月前

CURL:用于强化学习的对比无监督表示

CURL:用于强化学习的对比无监督表示

摘要

我们提出CURL:用于强化学习的对比无监督表征方法。CURL利用对比学习从原始像素中提取高层次特征,并在所提取的特征基础上实现离策略控制。在DeepMind Control Suite和Atari游戏的复杂任务上,CURL的表现优于以往所有基于像素的方法,无论是基于模型还是无模型的方法,在10万次环境交互步骤的基准测试中,分别实现了1.9倍和1.2倍的性能提升。在DeepMind Control Suite上,CURL是首个基于图像的算法,其样本效率几乎达到了使用状态空间特征方法的水平。相关代码已开源,可通过 https://github.com/MishaLaskin/curl 获取。

代码仓库

MishaLaskin/curl
官方
pytorch
gijskoning/ReproducingCURL
pytorch
GitHub 中提及
aravindsrinivas/curl_rainbow
pytorch
GitHub 中提及
asparius/barlowrl
pytorch
GitHub 中提及

基准测试

基准方法指标
atari-games-on-atari-2600-alienCURL
Score: 1148.2
atari-games-on-atari-2600-amidarCURL
Score: 232.3
atari-games-on-atari-2600-assaultCURL
Score: 543.7
atari-games-on-atari-2600-asterixCURL
Score: 524.3
atari-games-on-atari-2600-bank-heistCURL
Score: 193.7
atari-games-on-atari-2600-battle-zoneCURL
Score: 11208
atari-games-on-atari-2600-boxingCURL
Score: 4.8
atari-games-on-atari-2600-breakoutCURL
Score: 18.2
atari-games-on-atari-2600-chopper-commandCURL
Score: 1198
atari-games-on-atari-2600-crazy-climberCURL
Score: 27805.6
atari-games-on-atari-2600-demon-attackCURL
Score: 834
atari-games-on-atari-2600-freewayCURL
Score: 27.9
atari-games-on-atari-2600-frostbiteCURL
Score: 924
atari-games-on-atari-2600-gopherCURL
Score: 801.4
atari-games-on-atari-2600-heroCURL
Score: 6235.1
atari-games-on-atari-2600-james-bondCURL
Medium Human-Normalized Score: 400.1
atari-games-on-atari-2600-kangarooCURL
Score: 345.3
atari-games-on-atari-2600-krullCURL
Score: 3833.6
atari-games-on-atari-2600-kung-fu-masterCURL
Score: 14280
atari-games-on-atari-2600-ms-pacmanCURL
Score: 1492.8
atari-games-on-atari-2600-pongCURL
Score: 2.1
atari-games-on-atari-2600-private-eyeCURL
Score: 105.2
atari-games-on-atari-2600-qbertCURL
Score: 1225.6
atari-games-on-atari-2600-road-runnerCURL
Score: 6786.7
atari-games-on-atari-2600-seaquestCURL
Score: 408
atari-games-on-atari-2600-up-and-downCURL
Score: 2735.2
continuous-control-on-ball-in-cup-catchCURL
Score: 959
continuous-control-on-ball-in-cup-catch-1CURL
Score: 769
continuous-control-on-cartpole-swingupCURL
Score: 841
continuous-control-on-cartpole-swingup-1CURL
Score: 582
continuous-control-on-cheetah-runCURL
Score: 518
continuous-control-on-cheetah-run-1CURL
Score: 299
continuous-control-on-finger-spinCURL
Score: 926
continuous-control-on-finger-spin-1CURL
Score: 767
continuous-control-on-reacher-easyCURL
Score: 929
continuous-control-on-reacher-easy-1CURL
Score: 538
continuous-control-on-walker-walkCURL
Score: 902
continuous-control-on-walker-walk-1CURL
Score: 403

用 AI 构建 AI

从想法到上线——通过免费 AI 协同编程、开箱即用的环境和市场最优价格的 GPU 加速您的 AI 开发

AI 协同编程
即用型 GPU
最优价格
立即开始

Hyper Newsletters

订阅我们的最新资讯
我们会在北京时间 每周一的上午九点 向您的邮箱投递本周内的最新更新
邮件发送服务由 MailChimp 提供
CURL:用于强化学习的对比无监督表示 | 论文 | HyperAI超神经