Farzanegan Behzad, Moghadam Rohollah, Jagannathan Sarangapani, Natarajan Pappa
IEEE Trans Neural Netw Learn Syst. 2024 Dec;35(12):17254-17265. doi: 10.1109/TNNLS.2023.3301383. Epub 2024 Dec 2.
This article addresses a multilayer neural network (MNN)-based optimal adaptive tracking of partially uncertain nonlinear discrete-time (DT) systems in affine form. By employing an actor-critic neural network (NN) to approximate the value function and optimal control policy, the critic NN is updated via a novel hybrid learning scheme, where its weights are adjusted once at a sampling instant and also in a finite iterative manner within the instants to enhance the convergence rate. Moreover, to deal with the persistency of excitation (PE) condition, a replay buffer is incorporated into the critic update law through concurrent learning. To address the vanishing gradient issue, the actor and critic MNN weights are tuned using control input and temporal difference errors (TDEs), respectively. In addition, a weight consolidation scheme is incorporated into the critic MNN update law to attain lifelong learning and overcome catastrophic forgetting, thus lowering the cumulative cost. The tracking error, and the actor and critic weight estimation errors are shown to be bounded using the Lyapunov analysis. Simulation results using the proposed approach on a two-link robot manipulator show a significant reduction in tracking error by 44% and cumulative cost by 31% in a multitask environment.
本文研究了基于多层神经网络(MNN)的仿射形式部分不确定非线性离散时间(DT)系统的最优自适应跟踪问题。通过使用演员-评论家神经网络(NN)来逼近值函数和最优控制策略,评论家NN通过一种新颖的混合学习方案进行更新,其权重在每个采样时刻调整一次,并且在时刻内以有限迭代方式调整以提高收敛速度。此外,为了处理持续激励(PE)条件,通过并发学习将重放缓冲区纳入评论家更新律。为了解决梯度消失问题,分别使用控制输入和时间差分误差(TDE)来调整演员和评论家MNN的权重。此外,将权重巩固方案纳入评论家MNN更新律以实现终身学习并克服灾难性遗忘,从而降低累积成本。使用李雅普诺夫分析表明跟踪误差以及演员和评论家权重估计误差是有界的。在两连杆机器人操纵器上使用所提出方法的仿真结果表明,在多任务环境中跟踪误差显著降低了44%,累积成本降低了31%。