Li Man, Qin Jiahu, Freris Nikolaos M, Ho Daniel W C
IEEE Trans Neural Netw Learn Syst. 2022 Apr;33(4):1429-1440. doi: 10.1109/TNNLS.2020.3042331. Epub 2022 Apr 4.
In this article, we study a multiplayer Stackelberg-Nash game (SNG) pertaining to a nonlinear dynamical system, including one leader and multiple followers. At the higher level, the leader makes its decision preferentially with consideration of the reaction functions of all followers, while, at the lower level, each of the followers reacts optimally to the leader's strategy simultaneously by playing a Nash game. First, the optimal strategies for the leader and the followers are derived from down to the top, and these strategies are further shown to constitute the Stackelberg-Nash equilibrium points. Subsequently, to overcome the difficulty in calculating the equilibrium points analytically, we develop a novel two-level value iteration-based integral reinforcement learning (VI-IRL) algorithm that relies only upon partial information of system dynamics. We establish that the proposed method converges asymptotically to the equilibrium strategies under the weak coupling conditions. Moreover, we introduce effective termination criteria to guarantee the admissibility of the policy (strategy) profile obtained from a finite number of iterations of the proposed algorithm. In the implementation of our scheme, we employ neural networks (NNs) to approximate the value functions and invoke the least-squares methods to update the involved weights. Finally, the effectiveness of the developed algorithm is verified by two simulation examples.
在本文中,我们研究了一个与非线性动力系统相关的多人斯塔克尔伯格 - 纳什博弈(SNG),其中包括一个领导者和多个跟随者。在较高层次上,领导者在考虑所有跟随者的反应函数的情况下优先做出决策,而在较低层次上,每个跟随者通过进行纳什博弈同时对领导者的策略做出最优反应。首先,从下到上推导领导者和跟随者的最优策略,并且这些策略进一步被证明构成斯塔克尔伯格 - 纳什均衡点。随后,为了克服解析计算均衡点的困难,我们开发了一种新颖的基于两级值迭代的积分强化学习(VI - IRL)算法,该算法仅依赖于系统动力学的部分信息。我们证明了所提出的方法在弱耦合条件下渐近收敛到均衡策略。此外,我们引入有效的终止准则以确保从所提出算法的有限次迭代中获得的策略(策略配置)的可接受性。在我们方案的实现中,我们使用神经网络(NNs)来逼近值函数,并调用最小二乘法来更新相关权重。最后,通过两个仿真示例验证了所开发算法的有效性。