Suppr超能文献

基于自然进化策略的稳定非线性动力系统的机器人策略改进

Robot Policy Improvement With Natural Evolution Strategies for Stable Nonlinear Dynamical System.

作者信息

Hu Yingbai, Chen Guang, Li Zhijun, Knoll Alois

出版信息

IEEE Trans Cybern. 2023 Jun;53(6):4002-4014. doi: 10.1109/TCYB.2022.3192049. Epub 2023 May 17.

Abstract

Robot learning through kinesthetic teaching is a promising way of cloning human behaviors, but it has its limits in the performance of complex tasks with small amounts of data, due to compounding errors. In order to improve the robustness and adaptability of imitation learning, a hierarchical learning strategy is proposed: low-level learning comprises only behavioral cloning with supervised learning, and high-level learning constitutes policy improvement. First, the Gaussian mixture model (GMM)-based dynamical system is formulated to encode a motion from the demonstration. We then derive the sufficient conditions of the GMM parameters that guarantee the global stability of the dynamical system from any initial state, using the Lyapunov stability theorem. Generally, imitation learning should reason about the motion well into the future for a wide range of tasks; it is significant to improve the adaptability of the learning method by policy improvement. Finally, a method based on exponential natural evolution strategies is proposed to optimize the parameters of the dynamical system associated with the stiffness of variable impedance control, in which the exploration noise is subject to stability conditions of the dynamical system in the exploration space, thus guaranteeing the global stability. Empirical evaluations are conducted on manipulators for different scenarios, including motion planning with obstacle avoidance and stiffness learning.

摘要

通过动觉教学进行机器人学习是一种很有前景的克隆人类行为的方式,但由于误差的累积,在处理少量数据的复杂任务时存在局限性。为了提高模仿学习的鲁棒性和适应性,提出了一种分层学习策略:低级学习仅包括基于监督学习的行为克隆,高级学习则是策略改进。首先,构建基于高斯混合模型(GMM)的动态系统,对示范中的运动进行编码。然后,利用李雅普诺夫稳定性定理,推导保证动态系统从任何初始状态全局稳定的GMM参数的充分条件。一般来说,模仿学习需要对广泛任务中的未来运动进行合理推理;通过策略改进提高学习方法的适应性具有重要意义。最后,提出一种基于指数自然进化策略的方法,用于优化与可变阻抗控制刚度相关的动态系统参数,其中探索噪声在探索空间中受动态系统稳定性条件的约束,从而保证全局稳定性。针对不同场景在操纵器上进行了实证评估,包括避障运动规划和刚度学习。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验