IEEE Trans Cybern. 2017 Oct;47(10):3341-3354. doi: 10.1109/TCYB.2016.2623859. Epub 2016 Nov 22.
The model-free optimal control problem of general discrete-time nonlinear systems is considered in this paper, and a data-based policy gradient adaptive dynamic programming (PGADP) algorithm is developed to design an adaptive optimal controller method. By using offline and online data rather than the mathematical system model, the PGADP algorithm improves control policy with a gradient descent scheme. The convergence of the PGADP algorithm is proved by demonstrating that the constructed Q -function sequence converges to the optimal Q -function. Based on the PGADP algorithm, the adaptive control method is developed with an actor-critic structure and the method of weighted residuals. Its convergence properties are analyzed, where the approximate Q -function converges to its optimum. Computer simulation results demonstrate the effectiveness of the PGADP-based adaptive control method.
本文研究了一般离散时间非线性系统的无模型最优控制问题,并提出了一种基于数据的策略梯度自适应动态规划(PGADP)算法来设计自适应最优控制器方法。该算法通过使用离线和在线数据而不是数学系统模型,采用梯度下降方案来改进控制策略。通过证明所构造的 Q 函数序列收敛到最优 Q 函数,证明了 PGADP 算法的收敛性。基于 PGADP 算法,采用演员-评论家结构和加权残值法开发了自适应控制方法。分析了其收敛特性,其中近似 Q 函数收敛到最优值。计算机仿真结果验证了基于 PGADP 的自适应控制方法的有效性。