Shen Hao, Wang Yun, Wu Jiacheng, Park Ju H, Wang Jing
IEEE Trans Cybern. 2024 Nov;54(11):7068-7079. doi: 10.1109/TCYB.2024.3448407. Epub 2024 Oct 30.
This article focuses on solving the secure control problem by developing a novel resilient hybrid learning scheme for discrete-time Markov jump cyber-physical systems with malicious attacks. Within the zero-sum game framework, the secure control problem is converted into solving a set of game coupled algebraic Riccati equations. However, it contains the coupling terms arising from the Markov jump parameters, which are difficult to solve. To address this issue, we propose a framework for parallel reinforcement learning. Thereafter, a model-based resilient hybrid learning scheme is first designed to obtain the optimal policies, where the system dynamics are required during the learning process. Furthermore, a novel online model-free resilient hybrid learning scheme combining the advantages of value iteration and policy iteration is proposed without using the system dynamics. Besides, the convergence of the proposed hybrid learning schemes is discussed. Eventually, the effectiveness of the designed algorithms is demonstrated with the inverted pendulum model.
本文聚焦于通过为遭受恶意攻击的离散时间马尔可夫跳跃网络物理系统开发一种新颖的弹性混合学习方案来解决安全控制问题。在零和博弈框架内,安全控制问题被转化为求解一组博弈耦合代数黎卡提方程。然而,它包含由马尔可夫跳跃参数产生的耦合项,难以求解。为解决此问题,我们提出了一个并行强化学习框架。此后,首先设计了一种基于模型的弹性混合学习方案以获得最优策略,该方案在学习过程中需要系统动态特性。此外,提出了一种新颖的无模型在线弹性混合学习方案,它结合了值迭代和策略迭代的优点且无需使用系统动态特性。此外,还讨论了所提出的混合学习方案的收敛性。最终,通过倒立摆模型证明了所设计算法的有效性。