IEEE Trans Neural Netw Learn Syst. 2015 Jan;26(1):140-51. doi: 10.1109/TNNLS.2014.2358227. Epub 2014 Oct 8.
This paper presents a partially model-free adaptive optimal control solution to the deterministic nonlinear discrete-time (DT) tracking control problem in the presence of input constraints. The tracking error dynamics and reference trajectory dynamics are first combined to form an augmented system. Then, a new discounted performance function based on the augmented system is presented for the optimal nonlinear tracking problem. In contrast to the standard solution, which finds the feedforward and feedback terms of the control input separately, the minimization of the proposed discounted performance function gives both feedback and feedforward parts of the control input simultaneously. This enables us to encode the input constraints into the optimization problem using a nonquadratic performance function. The DT tracking Bellman equation and tracking Hamilton-Jacobi-Bellman (HJB) are derived. An actor-critic-based reinforcement learning algorithm is used to learn the solution to the tracking HJB equation online without requiring knowledge of the system drift dynamics. That is, two neural networks (NNs), namely, actor NN and critic NN, are tuned online and simultaneously to generate the optimal bounded control policy. A simulation example is given to show the effectiveness of the proposed method.
本文提出了一种部分无模型自适应最优控制解决方案,用于解决存在输入约束的确定性非线性离散时间 (DT) 跟踪控制问题。首先,将跟踪误差动力学和参考轨迹动力学组合成一个增广系统。然后,针对最优非线性跟踪问题,提出了一种新的基于增广系统的折扣性能函数。与标准解决方案不同,后者分别找到控制输入的前馈和反馈项,所提出的折扣性能函数的最小化同时给出了控制输入的反馈和前馈部分。这使得我们能够使用非二次性能函数将输入约束编码到优化问题中。推导出了 DT 跟踪 Bellman 方程和跟踪 Hamilton-Jacobi-Bellman (HJB) 方程。基于演员-评论家的强化学习算法用于在线学习跟踪 HJB 方程的解,而无需了解系统漂移动力学。也就是说,两个神经网络(NN),即演员 NN 和评论家 NN,在线同时进行调整,以生成最优有界控制策略。给出了一个仿真示例,以显示所提出方法的有效性。