IEEE Trans Neural Netw Learn Syst. 2015 Apr;26(4):866-79. doi: 10.1109/TNNLS.2015.2401334. Epub 2015 Mar 2.
In this paper, a novel iterative adaptive dynamic programming (ADP)-based infinite horizon self-learning optimal control algorithm, called generalized policy iteration algorithm, is developed for nonaffine discrete-time (DT) nonlinear systems. Generalized policy iteration algorithm is a general idea of interacting policy and value iteration algorithms of ADP. The developed generalized policy iteration algorithm permits an arbitrary positive semidefinite function to initialize the algorithm, where two iteration indices are used for policy improvement and policy evaluation, respectively. It is the first time that the convergence, admissibility, and optimality properties of the generalized policy iteration algorithm for DT nonlinear systems are analyzed. Neural networks are used to implement the developed algorithm. Finally, numerical examples are presented to illustrate the performance of the developed algorithm.
本文提出了一种新的基于迭代自适应动态规划(ADP)的无限时域自学习最优控制算法,称为广义策略迭代算法,用于非仿射离散时间(DT)非线性系统。广义策略迭代算法是 ADP 中策略交互和价值迭代算法的通用思想。所提出的广义策略迭代算法允许任意正定半定函数初始化算法,其中分别使用两个迭代索引进行策略改进和策略评估。这是首次分析 DT 非线性系统的广义策略迭代算法的收敛性、容许性和最优性。神经网络用于实现所提出的算法。最后,通过数值实例验证了所提出算法的性能。