Wei Qinglai, Chen Wendi, Tan Xiangmin, Xiao Jun, Dong Qi
IEEE Trans Cybern. 2024 Nov;54(11):7011-7023. doi: 10.1109/TCYB.2024.3443522. Epub 2024 Oct 30.
This article considers an observer-based optimal backstepping security control for nonlinear systems using reinforcement learning (RL) strategy. The main challenge faced is the design of optimal contoller under the deception attacks. Therefore, this article introduces an improved security RL algorithm based on neural network technology under the design framework of critic-actor to resist attacks and optimize the entire system. Second, compared with some existing results, how to relax the general assumption about deception attack is also a difficult research topic. In this article, an unusual observer that uses the attacked system output is designed to estimate the real unavailable states caused by deception attacks, so that the impact of deception attacks is eliminated and the output feedback control is also achieved. By selecting the virtual controllers and the real controller as corresponding optimized controllers within the framework of the RL algorithm, the control strategy can ensure that all signals in the closed-loop system are semi-globally ultimately bounded. Finally, two simulation experiments will be run to demonstrate the effectiveness of the strategy.
本文考虑了一种基于观测器的非线性系统最优反步安全控制,采用强化学习(RL)策略。面临的主要挑战是在欺骗攻击下设计最优控制器。因此,本文在批评者-行动者设计框架下引入了一种基于神经网络技术的改进安全RL算法,以抵御攻击并优化整个系统。其次,与一些现有结果相比,如何放宽关于欺骗攻击的一般假设也是一个困难的研究课题。本文设计了一种不寻常的观测器,利用受攻击系统的输出来估计欺骗攻击导致的实际不可用状态,从而消除欺骗攻击的影响并实现输出反馈控制。通过在RL算法框架内将虚拟控制器和实际控制器选择为相应的优化控制器,该控制策略可以确保闭环系统中的所有信号半全局最终有界。最后,将进行两个仿真实验来证明该策略的有效性。