Wang Ning, Gao Ying, Zhao Hong, Ahn Choon Ki
IEEE Trans Neural Netw Learn Syst. 2021 Jul;32(7):3034-3045. doi: 10.1109/TNNLS.2020.3009214. Epub 2021 Jul 6.
In this article, a novel reinforcement learning-based optimal tracking control (RLOTC) scheme is established for an unmanned surface vehicle (USV) in the presence of complex unknowns, including dead-zone input nonlinearities, system dynamics, and disturbances. To be specific, dead-zone nonlinearities are decoupled to be input-dependent sloped controls and unknown biases that are encapsulated into lumped unknowns within tracking error dynamics. Neural network (NN) approximators are further deployed to adaptively identify complex unknowns and facilitate a Hamilton-Jacobi-Bellman (HJB) equation that formulates optimal tracking. In order to derive a practically optimal solution, an actor-critic reinforcement learning framework is built by employing adaptive NN identifiers to recursively approximate the total optimal policy and cost function. Eventually, theoretical analysis shows that the entire RLOTC scheme can render tracking errors that converge to an arbitrarily small neighborhood of the origin, subject to optimal cost. Simulation results and comprehensive comparisons on a prototype USV demonstrate remarkable effectiveness and superiority.
在本文中,针对存在复杂未知因素(包括死区输入非线性、系统动力学和干扰)的无人水面舰艇(USV),建立了一种基于新型强化学习的最优跟踪控制(RLOTC)方案。具体而言,死区非线性被解耦为与输入相关的斜率控制和未知偏差,这些偏差被封装到跟踪误差动态中的集总未知量中。进一步部署神经网络(NN)逼近器,以自适应识别复杂未知因素,并推导用于制定最优跟踪的哈密顿-雅可比-贝尔曼(HJB)方程。为了获得实际的最优解,通过采用自适应NN标识符递归逼近总最优策略和成本函数,构建了一个演员-评论家强化学习框架。最终,理论分析表明,在最优成本的约束下,整个RLOTC方案能够使跟踪误差收敛到原点的任意小邻域内。在原型USV上的仿真结果和综合比较证明了该方案具有显著的有效性和优越性。