通过课程事后强化学习实现学习敏捷性和适应性腿部运动。

Learning agility and adaptive legged locomotion via curricular hindsight reinforcement learning.

作者信息

Li Sicen, Wang Gang, Pang Yiming, Bai Panju, Hu Shihao, Liu Zhaojin, Wang Liquan, Li Jiawei

机构信息

The College of Mechanical and Electrical Engineering, Harbin Engineering University, Harbin, 150001, China.

The College of Shipbuilding Engineering, Harbin Engineering University, Harbin, 150001, China.

出版信息

Sci Rep. 2024 Nov 15;14(1):28089. doi: 10.1038/s41598-024-79292-4.

DOI:10.1038/s41598-024-79292-4

PMID:39543355

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11564515/

Abstract

Agile and adaptive maneuvers such as fall recovery, high-speed turning, and sprinting in the wild are challenging for legged systems. We propose a Curricular Hindsight Reinforcement Learning (CHRL) that learns an end-to-end tracking controller that achieves powerful agility and adaptation for the legged robot. The two key components are (i) a novel automatic curriculum strategy on task difficulty and (ii) a Hindsight Experience Replay strategy adapted to legged locomotion tasks. We demonstrated successful agile and adaptive locomotion on a real quadruped robot that performed fall recovery autonomously, coherent trotting, sustained outdoor running speeds up to 3.45 m/s, and a maximum yaw rate of 3.2 rad/s. This system produces adaptive behaviors responding to changing situations and unexpected disturbances on natural terrains like grass and dirt.

摘要

诸如在野外进行跌倒恢复、高速转弯和冲刺等敏捷且自适应的动作，对于有腿系统来说具有挑战性。我们提出了一种课程后见之明强化学习（CHRL），它能学习一个端到端的跟踪控制器，为有腿机器人实现强大的敏捷性和适应性。两个关键组件是：（i）一种关于任务难度的新颖自动课程策略，以及（ii）一种适用于有腿运动任务的后见之明经验回放策略。我们在一个真实的四足机器人上展示了成功的敏捷且自适应的运动，该机器人能自主进行跌倒恢复、连贯小跑、在户外保持高达3.45米/秒的持续跑步速度，以及最大偏航率为3.2弧度/秒。该系统能产生适应性行为，以应对自然地形（如草地和泥土）上不断变化的情况和意外干扰。