IEEE Trans Cybern. 2016 Mar;46(3):840-53. doi: 10.1109/TCYB.2015.2492242. Epub 2015 Nov 2.
In this paper, a value iteration adaptive dynamic programming (ADP) algorithm is developed to solve infinite horizon undiscounted optimal control problems for discrete-time nonlinear systems. The present value iteration ADP algorithm permits an arbitrary positive semi-definite function to initialize the algorithm. A novel convergence analysis is developed to guarantee that the iterative value function converges to the optimal performance index function. Initialized by different initial functions, it is proven that the iterative value function will be monotonically nonincreasing, monotonically nondecreasing, or nonmonotonic and will converge to the optimum. In this paper, for the first time, the admissibility properties of the iterative control laws are developed for value iteration algorithms. It is emphasized that new termination criteria are established to guarantee the effectiveness of the iterative control laws. Neural networks are used to approximate the iterative value function and compute the iterative control law, respectively, for facilitating the implementation of the iterative ADP algorithm. Finally, two simulation examples are given to illustrate the performance of the present method.
本文提出了一种用于求解离散时间非线性系统无限时域无折扣最优控制问题的价值迭代自适应动态规划(ADP)算法。现值迭代 ADP 算法允许任意正定半定函数初始化算法。本文开发了一种新的收敛性分析方法,以保证迭代价值函数收敛到最优性能指标函数。通过不同的初始函数初始化,证明了迭代价值函数将单调非增、单调非减或非单调,并收敛到最优值。本文首次为价值迭代算法开发了迭代控制律的可接受性特性。强调建立新的终止准则以保证迭代控制律的有效性。分别使用神经网络来近似迭代价值函数和计算迭代控制律,以方便迭代 ADP 算法的实现。最后,给出了两个仿真示例来说明所提出方法的性能。