Ganie Irfan, Jagannathan S
Dept. of Electrical and Computer Engineering, Missouri University of Science and Technology, Rolla, 65401, MO, USA.
Neural Netw. 2025 Nov;191:107793. doi: 10.1016/j.neunet.2025.107793. Epub 2025 Jul 5.
A novel integral reinforcement learning (IRL)-based optimal trajectory tracking scheme for nonlinear continuous-time systems in strict feedback form is introduced by using backstepping and multilayer or deep neural networks (DNNs). The proposed method employs a dynamic surface control-based technique in an optimal framework to relax the need for repeatedly computing the derivatives of virtual controllers at each step of the backstepping process. An online singular value decomposition (SVD)-of the activation function gradient-based actor-critic DNN at each step of the backstepping process is employed to minimize a discounted value function. Novel online SVD-based weight update laws, which are shown to mitigate vanishing gradient, for the actor and critic DNNs are derived by using control input error and Bellman error respectively. A new online lifelong learning (LL) technique using Bellman residual and control input errors to overcome the issue of catastrophic forgetting in both critic and actor DNNs is also attempted, and closed-loop stability is analyzed and demonstrated. The effectiveness of the proposed method is shown in simulation on mobile robot tracking and ship autopilot, which demonstrates a 76% total cost reduction when compared to the literature.
通过使用反步法和多层或深度神经网络(DNN),引入了一种基于新型积分强化学习(IRL)的严格反馈形式非线性连续时间系统最优轨迹跟踪方案。所提出的方法在最优框架中采用基于动态表面控制的技术,以缓解在反步过程的每个步骤中反复计算虚拟控制器导数的需求。在反步过程的每个步骤中,采用基于激活函数梯度的演员-评论家DNN的在线奇异值分解(SVD)来最小化折扣值函数。分别使用控制输入误差和贝尔曼误差,推导出用于演员和评论家DNN的基于新型在线SVD的权重更新定律,该定律可减轻梯度消失问题。还尝试了一种使用贝尔曼残差和控制输入误差的新型在线终身学习(LL)技术,以克服评论家与演员DNN中的灾难性遗忘问题,并分析和证明了闭环稳定性。所提方法的有效性在移动机器人跟踪和船舶自动驾驶仪的仿真中得到了验证,与文献相比,总成本降低了76%。