Lu Renzhi, Wang Xiaotao, Ding Yiyu, Zhang Hai-Tao, Zhao Feng, Zhu Lijun, He Yong
IEEE Trans Neural Netw Learn Syst. 2024 Oct 18;PP. doi: 10.1109/TNNLS.2024.3474289.
In this article, an optimal surrounding control algorithm is proposed for multiple unmanned surface vessels (USVs), in which actor-critic reinforcement learning (RL) is utilized to optimize the merging process. Specifically, the multiple-USV optimal surrounding control problem is first transformed into the Hamilton-Jacobi-Bellman (HJB) equation, which is difficult to solve due to its nonlinearity. An adaptive actor-critic RL control paradigm is then proposed to obtain the optimal surround strategy, wherein the Bellman residual error is utilized to construct the network update laws. Particularly, a virtual controller representing intermediate transitions and an actual controller operating on a dynamics model are employed as surrounding control solutions for second-order USVs; thus, optimal surrounding control of the USVs is guaranteed. In addition, the stability of the proposed controller is analyzed by means of Lyapunov theory functions. Finally, numerical simulation results demonstrate that the proposed actor-critic RL-based surrounding controller can achieve the surrounding objective while optimizing the evolution process and obtains 9.76% and 20.85% reduction in trajectory length and energy consumption compared with the existing controller.
本文针对多艘无人水面舰艇(USV)提出了一种最优环绕控制算法,其中利用 actor-critic 强化学习(RL)来优化合并过程。具体而言,首先将多 USV 最优环绕控制问题转化为汉密尔顿-雅可比-贝尔曼(HJB)方程,由于其非线性,该方程难以求解。然后提出一种自适应 actor-critic RL 控制范式以获得最优环绕策略,其中利用贝尔曼残差误差来构建网络更新律。特别地,将表示中间过渡的虚拟控制器和在动力学模型上运行的实际控制器用作二阶 USV 的环绕控制解决方案;从而保证了 USV 的最优环绕控制。此外,通过李雅普诺夫理论函数分析了所提出控制器的稳定性。最后,数值仿真结果表明,所提出的基于 actor-critic RL 的环绕控制器在优化演化过程的同时能够实现环绕目标,与现有控制器相比,轨迹长度和能耗分别降低了 9.76%和 20.85%。