IEEE Trans Cybern. 2017 Oct;47(10):3331-3340. doi: 10.1109/TCYB.2016.2611613. Epub 2016 Oct 3.
In this paper, we investigate the nonzero-sum games for a class of discrete-time (DT) nonlinear systems by using a novel policy iteration (PI) adaptive dynamic programming (ADP) method. The main idea of our proposed PI scheme is to utilize the iterative ADP algorithm to obtain the iterative control policies, which not only ensure the system to achieve stability but also minimize the performance index function for each player. This paper integrates game theory, optimal control theory, and reinforcement learning technique to formulate and handle the DT nonzero-sum games for multiplayer. First, we design three actor-critic algorithms, an offline one and two online ones, for the PI scheme. Subsequently, neural networks are employed to implement these algorithms and the corresponding stability analysis is also provided via the Lyapunov theory. Finally, a numerical simulation example is presented to demonstrate the effectiveness of our proposed approach.
在本文中,我们通过使用一种新颖的策略迭代(PI)自适应动态规划(ADP)方法来研究一类离散时间(DT)非线性系统的非零和博弈。我们提出的 PI 方案的主要思想是利用迭代 ADP 算法来获得迭代控制策略,这不仅可以确保系统达到稳定,而且可以为每个参与者最小化性能指标函数。本文将博弈论、最优控制理论和强化学习技术集成在一起,用于对多人的 DT 非零和博弈进行建模和处理。首先,我们为 PI 方案设计了三个演员-评论家算法,一个离线的和两个在线的。随后,我们使用神经网络来实现这些算法,并通过 Lyapunov 理论提供相应的稳定性分析。最后,通过数值仿真示例验证了所提出方法的有效性。