Wang Ning, Gao Ying, Zhang Xuefeng
IEEE Trans Neural Netw Learn Syst. 2021 Dec;32(12):5456-5467. doi: 10.1109/TNNLS.2021.3056444. Epub 2021 Nov 30.
An unmanned surface vehicle (USV) under complicated marine environments can hardly be modeled well such that model-based optimal control approaches become infeasible. In this article, a self-learning-based model-free solution only using input-output signals of the USV is innovatively provided. To this end, a data-driven performance-prescribed reinforcement learning control (DPRLC) scheme is created to pursue control optimality and prescribed tracking accuracy simultaneously. By devising state transformation with prescribed performance, constrained tracking errors are substantially converted into constraint-free stabilization of tracking errors with unknown dynamics. Reinforcement learning paradigm using neural network-based actor-critic learning framework is further deployed to directly optimize controller synthesis deduced from the Bellman error formulation such that transformed tracking errors evolve a data-driven optimal controller. Theoretical analysis eventually ensures that the entire DPRLC scheme can guarantee prescribed tracking accuracy, subject to optimal cost. Both simulations and virtual-reality experiments demonstrate the remarkable effectiveness and superiority of the proposed DPRLC scheme.
在复杂海洋环境下的无人水面航行器(USV)很难得到很好的建模,以至于基于模型的最优控制方法变得不可行。在本文中,创新性地提出了一种仅使用无人水面航行器输入输出信号的基于自学习的无模型解决方案。为此,创建了一种数据驱动的性能规定强化学习控制(DPRLC)方案,以同时追求控制最优性和规定的跟踪精度。通过设计具有规定性能的状态变换,将受约束的跟踪误差大幅转化为具有未知动态特性的跟踪误差的无约束稳定。进一步采用基于神经网络的演员-评论家学习框架的强化学习范式,直接优化从贝尔曼误差公式推导的控制器综合,从而使变换后的跟踪误差产生一个数据驱动的最优控制器。理论分析最终确保整个DPRLC方案能够在最优成本的前提下保证规定的跟踪精度。仿真和虚拟现实实验均证明了所提出的DPRLC方案具有显著的有效性和优越性。