3 个月前

机器人强化学习中的平滑探索

机器人强化学习中的平滑探索

摘要

强化学习(Reinforcement Learning, RL)使机器人能够通过与真实世界交互来自主习得技能。在实际应用中,深度强化学习(Deep RL)通常采用无结构的基于步骤的探索策略,尽管该策略在仿真环境中表现优异,但在真实机器人上往往导致动作僵硬、不连贯,产生抖动行为。这种不稳定的运动模式不仅降低了探索效率,甚至可能对机器人本体造成损坏。为解决上述问题,本文将状态依赖性探索(State-Dependent Exploration, SDE)方法适配至当前主流的深度强化学习算法中。为实现这一适配,我们对原始SDE提出两项改进:一是引入更具泛化能力的特征表示,二是周期性重采样噪声。由此提出一种新型探索方法——广义状态依赖性探索(Generalized State-Dependent Exploration, gSDE)。我们通过在PyBullet连续控制任务中进行仿真评估,并在三类真实机器人平台上直接验证gSDE的有效性,包括一种肌腱驱动的弹性机器人、一只四足机器人以及一辆遥控汽车。gSDE的噪声采样间隔可灵活调节,使模型在性能与运动平滑性之间取得良好平衡,从而支持在真实机器人上直接进行训练,且无需牺牲学习性能。相关代码已开源,地址为:https://github.com/DLR-RM/stable-baselines3。

基准测试

基准方法指标
continuous-control-on-pybullet-antPPO
Return: 2160
continuous-control-on-pybullet-antA2C gSDE
Return: 2560
continuous-control-on-pybullet-antPPO gSDE
Return: 2587
continuous-control-on-pybullet-antA2C
Return: 1967
continuous-control-on-pybullet-antSAC gSDE
Return: 3459
continuous-control-on-pybullet-antSAC
Return: 2859
continuous-control-on-pybullet-antTD3 gSDE
Return: 3267
continuous-control-on-pybullet-antTD3
Return: 2865
continuous-control-on-pybullet-halfcheetahSAC
Return: 2883
continuous-control-on-pybullet-halfcheetahPPO + gSDE
Return: 2760
continuous-control-on-pybullet-halfcheetahA2C + gSDE
Return: 2028
continuous-control-on-pybullet-halfcheetahTD3
Return: 2687
continuous-control-on-pybullet-halfcheetahPPO
Return: 2254
continuous-control-on-pybullet-halfcheetahTD3 gSDE
Return: 2578
continuous-control-on-pybullet-halfcheetahSAC gSDE
Return: 2850
continuous-control-on-pybullet-halfcheetahA2C
Return: 1652
continuous-control-on-pybullet-hopperA2C
Return: 1559
continuous-control-on-pybullet-hopperPPO
Return: 1622
continuous-control-on-pybullet-hopperTD3
Return: 2470
continuous-control-on-pybullet-hopperSAC gSDE
Return: 2646
continuous-control-on-pybullet-hopperPPO gSDE
Return: 2508
continuous-control-on-pybullet-hopperA2C gSDE
Return: 1448
continuous-control-on-pybullet-hopperTD3 gSDE
Return: 2353
continuous-control-on-pybullet-hopperSAC
Return: 2477
continuous-control-on-pybullet-walker2dPPO
Return: 1238
continuous-control-on-pybullet-walker2dA2C
Return: 443
continuous-control-on-pybullet-walker2dSAC
Return: 2215
continuous-control-on-pybullet-walker2dPPO gSDE
Return: 1776
continuous-control-on-pybullet-walker2dTD3
Return: 2106
continuous-control-on-pybullet-walker2dA2C gSDE
Return: 694
continuous-control-on-pybullet-walker2dTD3 gSDE
Return: 1989
continuous-control-on-pybullet-walker2dSAC gSDE
Return: 2341

用 AI 构建 AI

从想法到上线——通过免费 AI 协同编程、开箱即用的环境和市场最优价格的 GPU 加速您的 AI 开发

AI 协同编程
即用型 GPU
最优价格
立即开始

Hyper Newsletters

订阅我们的最新资讯
我们会在北京时间 每周一的上午九点 向您的邮箱投递本周内的最新更新
邮件发送服务由 MailChimp 提供
机器人强化学习中的平滑探索 | 论文 | HyperAI超神经