基于混合强化学习的动态平台双足机器人的稳定性控制。

Stability Control of a Biped Robot on a Dynamic Platform Based on Hybrid Reinforcement Learning.

机构信息

Laboratory of Motion Generation and Analysis, Faculty of Engineering, Monash University, Clayton, VIC 3800, Australia.

出版信息

Sensors (Basel). 2020 Aug 10;20(16):4468. doi: 10.3390/s20164468.

DOI:10.3390/s20164468

PMID:32785092

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7472320/

Abstract

In this work, we introduced a novel hybrid reinforcement learning scheme to balance a biped robot (NAO) on an oscillating platform, where the rotation of the platform is considered as the external disturbance to the robot. The platform had two degrees of freedom in rotation, pitch and roll. The state space comprised the position of center of pressure, and joint angles and joint velocities of two legs. The action space consisted of the joint angles of ankles, knees, and hips. By adding the inverse kinematics techniques, the dimension of action space was significantly reduced. Then, a model-based system estimator was employed during the offline training procedure to estimate the dynamics model of the system by using novel hierarchical Gaussian processes, and to provide initial control inputs, after which the reduced action space of each joint was obtained by minimizing the cost of reaching the desired stable state. Finally, a model-free optimizer based on DQN (λ) was introduced to fine tune the initial control inputs, where the optimal control inputs were obtained for each joint at any state. The proposed reinforcement learning not only successfully avoided the distribution mismatch problem, but also improved the sample efficiency. Simulation results showed that the proposed hybrid reinforcement learning mechanism enabled the NAO robot to balance on an oscillating platform with different frequencies and magnitudes. Both control performance and robustness were guaranteed during the experiments.

摘要

在这项工作中，我们引入了一种新的混合强化学习方案，以平衡在摆动平台上的双足机器人（NAO），其中平台的旋转被视为机器人的外部干扰。平台具有两个自由度的旋转，俯仰和滚动。状态空间由压力中心的位置以及两条腿的关节角度和关节速度组成。动作空间由踝关节、膝关节和髋关节的关节角度组成。通过添加逆运动学技术，动作空间的维度显著减小。然后，在离线训练过程中使用基于模型的系统估计器，通过使用新型分层高斯过程来估计系统的动力学模型，并提供初始控制输入，之后通过最小化达到期望稳定状态的成本来获得每个关节的减小的动作空间。最后，引入了基于 DQN（λ）的无模型优化器来微调初始控制输入，其中在任何状态下都可以为每个关节获得最优控制输入。所提出的强化学习不仅成功避免了分布不匹配问题，而且提高了样本效率。仿真结果表明，所提出的混合强化学习机制使 NAO 机器人能够在具有不同频率和幅度的摆动平台上保持平衡。在实验过程中保证了控制性能和鲁棒性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6819/7472320/0418077e5fbf/sensors-20-04468-g001.jpg

相似文献

Stability Control of a Biped Robot on a Dynamic Platform Based on Hybrid Reinforcement Learning.

Sensors (Basel). 2020 Aug 10;20(16):4468. doi: 10.3390/s20164468.

LORM: a novel reinforcement learning framework for biped gait control.

PeerJ Comput Sci. 2022 Mar 28;8:e927. doi: 10.7717/peerj-cs.927. eCollection 2022.

Evaluation of linearly solvable Markov decision process with dynamic model learning in a mobile robot navigation task.

Front Neurorobot. 2013 Apr 5;7:7. doi: 10.3389/fnbot.2013.00007. eCollection 2013.

A parallel heterogeneous policy deep reinforcement learning algorithm for bipedal walking motion design.

Front Neurorobot. 2023 Aug 8;17:1205775. doi: 10.3389/fnbot.2023.1205775. eCollection 2023.

A Stability Training Method of Legged Robots Based on Training Platforms and Reinforcement Learning with Its Simulation and Experiment.

Micromachines (Basel). 2022 Aug 31;13(9):1436. doi: 10.3390/mi13091436.

Humanoid Locomotion and the Brain

Robust disturbance rejection control of a biped robotic system using high-order extended state observer.

ISA Trans. 2016 May;62:276-86. doi: 10.1016/j.isatra.2016.02.003. Epub 2016 Feb 28.

A Spring Compensation Method for a Low-Cost Biped Robot Based on Whole Body Control.

Biomimetics (Basel). 2023 Mar 21;8(1):126. doi: 10.3390/biomimetics8010126.

Multi-robot task allocation in e-commerce RMFS based on deep reinforcement learning.

Math Biosci Eng. 2023 Jan;20(2):1903-1918. doi: 10.3934/mbe.2023087. Epub 2022 Nov 8.

Characterization of continuum robot arms under reinforcement learning and derived improvements.

Front Robot AI. 2022 Sep 1;9:895388. doi: 10.3389/frobt.2022.895388. eCollection 2022.

引用本文的文献

LORM: a novel reinforcement learning framework for biped gait control.

PeerJ Comput Sci. 2022 Mar 28;8:e927. doi: 10.7717/peerj-cs.927. eCollection 2022.

本文引用的文献

Multiobjective Evolution of Biped Robot Gaits Using Advanced Continuous Ant-Colony Optimized Recurrent Neural Networks.

IEEE Trans Cybern. 2018 Jun;48(6):1910-1922. doi: 10.1109/TCYB.2017.2718037. Epub 2017 Jun 30.

Gaussian Processes for Data-Efficient Learning in Robotics and Control.

IEEE Trans Pattern Anal Mach Intell. 2015 Feb;37(2):408-23. doi: 10.1109/TPAMI.2013.218.

SVR versus neural-fuzzy network controllers for the sagittal balance of a biped robot.

IEEE Trans Neural Netw. 2009 Dec;20(12):1885-97. doi: 10.1109/TNN.2009.2032183. Epub 2009 Oct 2.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

基于混合强化学习的动态平台双足机器人的稳定性控制。

Stability Control of a Biped Robot on a Dynamic Platform Based on Hybrid Reinforcement Learning.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献