IEEE Trans Neural Netw Learn Syst. 2017 Mar;28(3):500-509. doi: 10.1109/TNNLS.2015.2503980. Epub 2015 Dec 22.
In this paper, we consider discrete-time infinite horizon problems of optimal control to a terminal set of states. These are the problems that are often taken as the starting point for adaptive dynamic programming. Under very general assumptions, we establish the uniqueness of the solution of Bellman's equation, and we provide convergence results for value and policy iterations.
在本文中,我们考虑了状态终端集的离散时间无限时域最优控制问题。这些问题通常被作为自适应动态规划的起点。在非常一般的假设下,我们建立了贝尔曼方程解的唯一性,并提供了价值迭代和策略迭代的收敛性结果。