Fang Haiyang, Zhang Maoguang, He Shuping, Luan Xiaoli, Liu Fei, Ding Zhengtao
IEEE Trans Cybern. 2023 Dec;53(12):7635-7647. doi: 10.1109/TCYB.2022.3186886. Epub 2023 Nov 29.
A novel completely mode-free integral reinforcement learning (CMFIRL)-based iteration algorithm is proposed in this article to compute the two-player zero-sum games and the Nash equilibrium problems, that is, the optimal control policy pairs, for tidal turbine system based on continuous-time Markov jump linear model with exact transition probability and completely unknown dynamics. First, the tidal turbine system is modeled into Markov jump linear systems, followed by a designed subsystem transformation technique to decouple the jumping modes. Then, a completely mode-free reinforcement learning algorithm is employed to address the game-coupled algebraic Riccati equations without using the information of the system dynamics, in order to reach the Nash equilibrium. The learning algorithm includes one iteration loop by updating the control policy and the disturbance policy simultaneously. Also, the exploration signal is added for motivating the system, and the convergence of the CMFIRL iteration algorithm is rigorously proved. Finally, a simulation example is given to illustrate the effectiveness and applicability of the control design approach.
本文提出了一种基于新型完全无模式积分强化学习(CMFIRL)的迭代算法,用于计算基于具有精确转移概率和完全未知动态的连续时间马尔可夫跳跃线性模型的潮汐涡轮机系统的两人零和博弈及纳什均衡问题,即最优控制策略对。首先,将潮汐涡轮机系统建模为马尔可夫跳跃线性系统,接着采用设计的子系统变换技术来解耦跳跃模式。然后,使用一种完全无模式的强化学习算法,在不使用系统动态信息的情况下求解博弈耦合代数黎卡提方程,以达到纳什均衡。该学习算法通过同时更新控制策略和干扰策略包含一个迭代循环。此外,添加探索信号以激励系统,并严格证明了CMFIRL迭代算法的收敛性。最后,给出一个仿真例子来说明控制设计方法的有效性和适用性。