Tutsoy Onder, Barkana Duygun Erol, Balikci Kemal
IEEE Trans Cybern. 2023 Jan;53(1):329-337. doi: 10.1109/TCYB.2021.3091680. Epub 2022 Dec 23.
Model-free control approaches require advanced exploration-exploitation policies to achieve practical tasks such as learning to bipedal robot walk in unstructured environments. In this article, we first construct a comprehensive exploration-exploitation policy that carries quality knowledge about the long-term predictor and the control policy, and the control signal of the model-free algorithms. Therefore, the developed model-free algorithm continues exploration by adjusting its unknown parameters until the desired learning and control are accomplished. Second, we provide an utterly model-free adaptive law enriched with the exploration-exploitation policy and derived step-by-step using the exact analogy of the model-based solution. The obtained adaptive control law considers the control signal saturation and the control signal (input) delay. Performed Lyapunov stability analysis ensures the convergence of the adaptive law that can also be used for intelligent control approaches. Third, we implement the adaptive algorithm in real time on a challenging benchmark system: a fourth-order, coupled dynamics, input saturated, and time-delayed underactuated manipulator. The results show that the proposed adaptive algorithm explores larger state-action spaces and treats the vanishing gradient problem in both learning and control. Also, we notice from the results that the learning and control properties of the adaptive algorithm are optimized as required.
无模型控制方法需要先进的探索-利用策略来完成诸如学习让双足机器人在非结构化环境中行走等实际任务。在本文中,我们首先构建了一种综合的探索-利用策略,该策略包含有关长期预测器和控制策略的质量知识以及无模型算法的控制信号。因此,所开发的无模型算法通过调整其未知参数持续进行探索,直到实现期望的学习和控制。其次,我们提供了一种完全无模型的自适应律,该自适应律丰富了探索-利用策略,并使用基于模型的解决方案的精确类比逐步推导得出。所获得的自适应控制律考虑了控制信号饱和以及控制信号(输入)延迟。进行的李雅普诺夫稳定性分析确保了自适应律的收敛,该自适应律也可用于智能控制方法。第三,我们在一个具有挑战性的基准系统上实时实现了自适应算法:一个四阶、耦合动力学、输入饱和且具有时间延迟的欠驱动机械手。结果表明,所提出的自适应算法探索了更大的状态-动作空间,并在学习和控制中处理了梯度消失问题。此外,我们从结果中注意到,自适应算法的学习和控制特性根据要求得到了优化。