IEEE Trans Neural Netw Learn Syst. 2017 Aug;28(8):1929-1940. doi: 10.1109/TNNLS.2017.2654324. Epub 2017 Feb 1.
This paper presents a Hamiltonian-driven framework of adaptive dynamic programming (ADP) for continuous time nonlinear systems, which consists of evaluation of an admissible control, comparison between two different admissible policies with respect to the corresponding the performance function, and the performance improvement of an admissible control. It is showed that the Hamiltonian can serve as the temporal difference for continuous-time systems. In the Hamiltonian-driven ADP, the critic network is trained to output the value gradient. Then, the inner product between the critic and the system dynamics produces the value derivative. Under some conditions, the minimization of the Hamiltonian functional is equivalent to the value function approximation. An iterative algorithm starting from an arbitrary admissible control is presented for the optimal control approximation with its convergence proof. The implementation is accomplished by a neural network approximation. Two simulation studies demonstrate the effectiveness of Hamiltonian-driven ADP.
本文提出了一种适用于连续时间非线性系统的基于哈密顿动力系统的自适应动态规划(ADP)框架,该框架包括评估一个可接受的控制,比较两个不同的可接受的策略相对于相应的性能函数,以及可接受的控制的性能改进。结果表明,哈密顿量可以作为连续时间系统的时间差分。在基于哈密顿动力系统的 ADP 中,通过训练 Critic 网络输出值梯度。然后,通过 Critic 和系统动态之间的内积产生值导数。在某些条件下,哈密顿泛函的最小化等效于值函数逼近。本文提出了一种从任意可接受的控制开始的迭代算法,用于最优控制逼近,并给出了其收敛证明。该算法通过神经网络逼近实现。两个仿真研究验证了哈密顿动力系统 ADP 的有效性。