Zhao Qingtao, Sun Jian, Wang Gang, Chen Jie
IEEE Trans Neural Netw Learn Syst. 2022 May;33(5):1905-1913. doi: 10.1109/TNNLS.2021.3071545. Epub 2022 May 2.
For nonzero-sum (NZS) games of nonlinear systems, reinforcement learning (RL) or adaptive dynamic programming (ADP) has shown its capability of approximating the desired index performance and the optimal input policy iteratively. In this article, an event-triggered ADP is proposed for NZS games of continuous-time nonlinear systems with completely unknown system dynamics. To achieve the Nash equilibrium solution approximately, the critic neural networks and actor neural networks are utilized to estimate the value functions and the control policies, respectively. Compared with the traditional time-triggered mechanism, the proposed algorithm updates the neural network weights as well as the inputs of players only when a state-based event-triggered condition is violated. It is shown that the system stability and the weights' convergence are still guaranteed under mild assumptions, while occupation of communication and computation resources is considerably reduced. Meanwhile, the infamous Zeno behavior is excluded by proving the existence of a minimum inter-event time (MIET) to ensure the feasibility of the closed-loop event-triggered continuous-time system. Finally, a numerical example is simulated to illustrate the effectiveness of the proposed approach.
对于非线性系统的非零和(NZS)博弈,强化学习(RL)或自适应动态规划(ADP)已展现出其迭代逼近期望指标性能和最优输入策略的能力。本文针对系统动力学完全未知的连续时间非线性系统的NZS博弈,提出了一种事件触发ADP方法。为了近似获得纳什均衡解,分别利用评判神经网络和执行神经网络来估计值函数和控制策略。与传统的时间触发机制相比,所提算法仅在基于状态的事件触发条件被违反时,才更新神经网络权重以及参与者的输入。结果表明,在温和假设下仍能保证系统稳定性和权重收敛,同时显著减少了通信和计算资源的占用。此外,通过证明最小事件间时间(MIET)的存在,排除了声名狼藉的芝诺行为,以确保闭环事件触发连续时间系统的可行性。最后,通过一个数值例子仿真说明了所提方法的有效性。