Zhang Qichao, Zhao Dongbin
IEEE Trans Cybern. 2019 Aug;49(8):2874-2885. doi: 10.1109/TCYB.2018.2830820. Epub 2018 May 16.
This paper is concerned about the nonlinear optimization problem of nonzero-sum (NZS) games with unknown drift dynamics. The data-based integral reinforcement learning (IRL) method is proposed to approximate the Nash equilibrium of NZS games iteratively. Furthermore, we prove that the data-based IRL method is equivalent to the model-based policy iteration algorithm, which guarantees the convergence of the proposed method. For the implementation purpose, a single-critic neural network structure for the NZS games is given. To enhance the application capability of the data-based IRL method, we design the updating laws of critic weights based on the offline and online iterative learning methods, respectively. Note that the experience replay technique is introduced in the online iterative learning, which can improve the convergence rate of critic weights during the learning process. The uniform ultimate boundedness of the critic weights are guaranteed using the Lyapunov method. Finally, the numerical results demonstrate the effectiveness of the data-based IRL algorithm for nonlinear NZS games with unknown drift dynamics.
本文关注具有未知漂移动力学的非零和(NZS)博弈的非线性优化问题。提出了基于数据的积分强化学习(IRL)方法来迭代逼近NZS博弈的纳什均衡。此外,我们证明了基于数据的IRL方法等同于基于模型的策略迭代算法,这保证了所提方法的收敛性。出于实现目的,给出了用于NZS博弈的单批评神经网络结构。为提高基于数据的IRL方法的应用能力,我们分别基于离线和在线迭代学习方法设计了批评家权重的更新律。注意,在线迭代学习中引入了经验回放技术,这可以在学习过程中提高批评家权重的收敛速度。使用李雅普诺夫方法保证了批评家权重的一致最终有界性。最后,数值结果证明了基于数据的IRL算法对于具有未知漂移动力学的非线性NZS博弈的有效性。