Li Chunguang, Li Mengru, Tao Chongben
School of Computer and Information Engineering, Changzhou Institute of Technology, Changzhou, Jiangsu, China.
School of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou, China.
Front Neurorobot. 2023 Aug 8;17:1205775. doi: 10.3389/fnbot.2023.1205775. eCollection 2023.
Considering the dynamics and non-linear characteristics of biped robots, gait optimization is an extremely challenging task. To tackle this issue, a parallel heterogeneous policy Deep Reinforcement Learning (DRL) algorithm for gait optimization is proposed. Firstly, the Deep Deterministic Policy Gradient (DDPG) algorithm is used as the main architecture to run multiple biped robots in parallel to interact with the environment. And the network is shared to improve the training efficiency. Furthermore, heterogeneous experience replay is employed instead of the traditional experience replay mechanism to optimize the utilization of experience. Secondly, according to the walking characteristics of biped robots, a biped robot periodic gait is designed with reference to sinusoidal curves. The periodic gait takes into account the effects of foot lift height, walking period, foot lift speed and ground contact force of the biped robot. Finally, different environments and different biped robot models pose challenges for different optimization algorithms. Thus, a unified gait optimization framework for biped robots based on the RoboCup3D platform is established. Comparative experiments were conducted using the unified gait optimization framework, and the experimental results show that the method outlined in this paper can make the biped robot walk faster and more stably.
考虑到双足机器人的动力学和非线性特性,步态优化是一项极具挑战性的任务。为解决这一问题,提出了一种用于步态优化的并行异构策略深度强化学习(DRL)算法。首先,以深度确定性策略梯度(DDPG)算法作为主架构,并行运行多个双足机器人与环境进行交互。并且共享网络以提高训练效率。此外,采用异构经验回放代替传统的经验回放机制来优化经验的利用。其次,根据双足机器人的行走特性,参考正弦曲线设计了双足机器人的周期性步态。该周期性步态考虑了双足机器人的抬脚高度、行走周期、抬脚速度和地面接触力的影响。最后,不同的环境和不同的双足机器人模型对不同的优化算法提出了挑战。因此,基于RoboCup3D平台建立了一个统一的双足机器人步态优化框架。使用该统一步态优化框架进行了对比实验,实验结果表明本文所述方法能使双足机器人行走得更快、更稳定。