Bian Tao, Jiang Zhong-Ping
IEEE Trans Neural Netw Learn Syst. 2022 Jul;33(7):2781-2790. doi: 10.1109/TNNLS.2020.3045087. Epub 2022 Jul 6.
This article studies the adaptive optimal control problem for continuous-time nonlinear systems described by differential equations. A key strategy is to exploit the value iteration (VI) method proposed initially by Bellman in 1957 as a fundamental tool to solve dynamic programming problems. However, previous VI methods are all exclusively devoted to the Markov decision processes and discrete-time dynamical systems. In this article, we aim to fill up the gap by developing a new continuous-time VI method that will be applied to address the adaptive or nonadaptive optimal control problems for continuous-time systems described by differential equations. Like the traditional VI, the continuous-time VI algorithm retains the nice feature that there is no need to assume the knowledge of an initial admissible control policy. As a direct application of the proposed VI method, a new class of adaptive optimal controllers is obtained for nonlinear systems with totally unknown dynamics. A learning-based control algorithm is proposed to show how to learn robust optimal controllers directly from real-time data. Finally, two examples are given to illustrate the efficacy of the proposed methodology.
本文研究了由微分方程描述的连续时间非线性系统的自适应最优控制问题。一个关键策略是利用贝尔曼于1957年最初提出的价值迭代(VI)方法,将其作为解决动态规划问题的基本工具。然而,以前的VI方法都专门用于马尔可夫决策过程和离散时间动态系统。在本文中,我们旨在通过开发一种新的连续时间VI方法来填补这一空白,该方法将用于解决由微分方程描述的连续时间系统的自适应或非自适应最优控制问题。与传统的VI一样,连续时间VI算法保留了一个很好的特性,即无需假设初始可允许控制策略的知识。作为所提出的VI方法的直接应用,针对动力学完全未知的非线性系统,获得了一类新的自适应最优控制器。提出了一种基于学习的控制算法,以展示如何直接从实时数据中学习鲁棒最优控制器。最后,给出了两个例子来说明所提方法的有效性。