Suppr超能文献

基于混合强化学习的动态平台双足机器人的稳定性控制。

Stability Control of a Biped Robot on a Dynamic Platform Based on Hybrid Reinforcement Learning.

机构信息

Laboratory of Motion Generation and Analysis, Faculty of Engineering, Monash University, Clayton, VIC 3800, Australia.

出版信息

Sensors (Basel). 2020 Aug 10;20(16):4468. doi: 10.3390/s20164468.

Abstract

In this work, we introduced a novel hybrid reinforcement learning scheme to balance a biped robot (NAO) on an oscillating platform, where the rotation of the platform is considered as the external disturbance to the robot. The platform had two degrees of freedom in rotation, pitch and roll. The state space comprised the position of center of pressure, and joint angles and joint velocities of two legs. The action space consisted of the joint angles of ankles, knees, and hips. By adding the inverse kinematics techniques, the dimension of action space was significantly reduced. Then, a model-based system estimator was employed during the offline training procedure to estimate the dynamics model of the system by using novel hierarchical Gaussian processes, and to provide initial control inputs, after which the reduced action space of each joint was obtained by minimizing the cost of reaching the desired stable state. Finally, a model-free optimizer based on DQN (λ) was introduced to fine tune the initial control inputs, where the optimal control inputs were obtained for each joint at any state. The proposed reinforcement learning not only successfully avoided the distribution mismatch problem, but also improved the sample efficiency. Simulation results showed that the proposed hybrid reinforcement learning mechanism enabled the NAO robot to balance on an oscillating platform with different frequencies and magnitudes. Both control performance and robustness were guaranteed during the experiments.

摘要

在这项工作中,我们引入了一种新的混合强化学习方案,以平衡在摆动平台上的双足机器人(NAO),其中平台的旋转被视为机器人的外部干扰。平台具有两个自由度的旋转,俯仰和滚动。状态空间由压力中心的位置以及两条腿的关节角度和关节速度组成。动作空间由踝关节、膝关节和髋关节的关节角度组成。通过添加逆运动学技术,动作空间的维度显著减小。然后,在离线训练过程中使用基于模型的系统估计器,通过使用新型分层高斯过程来估计系统的动力学模型,并提供初始控制输入,之后通过最小化达到期望稳定状态的成本来获得每个关节的减小的动作空间。最后,引入了基于 DQN(λ)的无模型优化器来微调初始控制输入,其中在任何状态下都可以为每个关节获得最优控制输入。所提出的强化学习不仅成功避免了分布不匹配问题,而且提高了样本效率。仿真结果表明,所提出的混合强化学习机制使 NAO 机器人能够在具有不同频率和幅度的摆动平台上保持平衡。在实验过程中保证了控制性能和鲁棒性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6819/7472320/0418077e5fbf/sensors-20-04468-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验