Suppr超能文献

LORM:一种用于双足步态控制的新型强化学习框架。

LORM: a novel reinforcement learning framework for biped gait control.

作者信息

Zhang Weiyi, Jiang Yancao, Farrukh Fasih Ud Din, Zhang Chun, Zhang Debing, Wang Guangqi

机构信息

School of Integrated Circuits, Tsinghua University, Beijing, People's Republic of China.

出版信息

PeerJ Comput Sci. 2022 Mar 28;8:e927. doi: 10.7717/peerj-cs.927. eCollection 2022.

Abstract

Legged robots are better able to adapt to different terrains compared with wheeled robots. However, traditional motion controllers suffer from extremely complex dynamics properties. Reinforcement learning (RL) helps to overcome the complications of dynamics design and calculation. In addition, the high autonomy of the RL controller results in a more robust response to complex environments and terrains compared with traditional controllers. However, RL algorithms are limited by the problems of convergence and training efficiency due to the complexity of the task. Learn and outperform the reference motion (LORM), an RL based framework for gait controlling of biped robot is proposed leveraging the prior knowledge of reference motion. The proposed trained agent outperformed the reference motion and existing motion-based methods. The RL environment was finely crafted for optimal performance, including the pruning of state space and action space, reward shaping, and design of episode criterion. Several improvements were implemented to further improve the training efficiency and performance including: random state initialization (RSI), the noise of joint angles, and a novel improvement based on symmetrization of gait. To validate the proposed method, the Darwin-op robot was set as the target platform and two different tasks were designed: (I) Walking as fast as possible and (II) Tracking specific velocity. In task (I), the proposed method resulted in the walking velocity of 0.488 m/s, with a 5.8 times improvement compared with the original traditional reference controller. The directional accuracy improved by 87.3%. The velocity performance achieved 2× compared with the rated max velocity and more than 8× compared with other recent works. To our knowledge, our work achieved the best velocity performance on the platform Darwin-op. In task (II), the proposed method achieved a tracking accuracy of over 95%. Different environments are introduced including plains, slopes, uneven terrains, and walking with external force, where the robot was expected to maintain walking stability with ideal speed and little direction deviation, to validate the performance and robustness of the proposed method.

摘要

与轮式机器人相比,有腿机器人更能适应不同地形。然而,传统运动控制器存在极其复杂的动力学特性问题。强化学习(RL)有助于克服动力学设计和计算的复杂性。此外,与传统控制器相比,RL控制器的高自主性使其对复杂环境和地形的响应更加强健。然而,由于任务的复杂性,RL算法受到收敛和训练效率问题的限制。基于参考运动的先验知识,提出了一种用于双足机器人步态控制的基于RL的框架——学习并超越参考运动(LORM)。所提出的经过训练的智能体优于参考运动和现有的基于运动的方法。为实现最优性能精心构建了RL环境,包括状态空间和动作空间的剪枝、奖励塑造以及情节准则的设计。实施了多项改进以进一步提高训练效率和性能,包括:随机状态初始化(RSI)、关节角度噪声以及基于步态对称化的新颖改进。为验证所提出的方法,将达尔文-op机器人设为目标平台并设计了两项不同任务:(I)尽可能快地行走和(II)跟踪特定速度。在任务(I)中,所提出的方法实现了0.488米/秒的行走速度,与原始传统参考控制器相比提高了5.8倍。方向精度提高了87.3%。速度性能达到额定最大速度的2倍,与其他近期工作相比超过8倍。据我们所知,我们的工作在达尔文-op平台上实现了最佳速度性能。在任务(II)中,所提出的方法实现了超过95%的跟踪精度。引入了不同环境,包括平原、斜坡、不平坦地形以及在外力作用下行走,期望机器人在这些环境中以理想速度和极小的方向偏差保持行走稳定性,以验证所提出方法的性能和鲁棒性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/463c/9044250/3a7f7c8c196e/peerj-cs-08-927-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验