State Key Laboratory for Turbulence and Complex Systems, Department of Advanced Manufacturing and Robotics, College of Engineering, Peking University, Beijing 100871, China.
Tencent Robotics X, Shenzhen 518057, China.
Sensors (Basel). 2023 Dec 20;24(1):28. doi: 10.3390/s24010028.
In the field of quadruped robots, the most classic motion control algorithm is based on model prediction control (MPC). However, this method poses challenges as it necessitates the precise construction of the robot's dynamics model, making it difficult to achieve agile movements similar to those of a biological dog. Due to these limitations, researchers are increasingly turning to model-free learning methods, which significantly reduce the difficulty of modeling and engineering debugging and simultaneously reduce real-time optimization computational burden. Inspired by the growth process of humans and animals, from learning to walk to fluent movements, this article proposes a hierarchical reinforcement learning framework for the motion controller to learn some higher-level tasks. First, some basic motion skills can be learned from motion data captured from a biological dog. Then, with these learned basic motion skills as a foundation, the quadruped robot can focus on learning higher-level tasks without starting from low-level kinematics, which saves redundant training time. By utilizing domain randomization techniques during the training process, the trained policy function can be directly transferred to a physical robot without modification, and the resulting controller can perform more biomimetic movements. By implementing the method proposed in this article, the agility and adaptability of the quadruped robot can be maximally utilized to achieve efficient operations in complex terrains.
在四足机器人领域,最经典的运动控制算法是基于模型预测控制(MPC)。然而,这种方法存在挑战,因为它需要精确构建机器人的动力学模型,难以实现类似生物狗的敏捷运动。由于这些限制,研究人员越来越多地转向无模型学习方法,这大大降低了建模和工程调试的难度,同时降低了实时优化计算负担。受人类和动物成长过程的启发,从学习走路到流畅运动,本文提出了一种分层强化学习框架,使运动控制器能够学习一些更高层次的任务。首先,可以从生物狗的运动数据中学习一些基本的运动技能。然后,以这些学习到的基本运动技能为基础,四足机器人可以专注于学习更高层次的任务,而无需从低层次的运动学开始,从而节省了冗余的训练时间。通过在训练过程中利用领域随机化技术,可以直接将训练好的策略函数转移到物理机器人上,无需修改,从而得到的控制器可以进行更仿生的运动。通过实施本文提出的方法,可以最大限度地利用四足机器人的敏捷性和适应性,实现在复杂地形中的高效作业。