IEEE Trans Cybern. 2022 Sep;52(9):9170-9178. doi: 10.1109/TCYB.2021.3052832. Epub 2022 Aug 18.
In this article, we study the feedback Nash strategy of the model-free nonzero-sum difference game. The main contribution is to present the Q -learning algorithm for the linear quadratic game without prior knowledge of the system model. It is noted that the studied game is in finite horizon which is novel to the learning algorithms in the literature which are mostly for the infinite-horizon Nash strategy. The key is to characterize the Q -factors in terms of the arbitrary control input and state information. A numerical example is given to verify the effectiveness of the proposed algorithm.
在本文中,我们研究了模型自由非零和差分博弈的反馈纳什策略。主要贡献在于提出了线性二次无模型系统模型先验知识的 Q-学习算法。需要注意的是,所研究的博弈是有限时域的,这与文献中学习算法的无限时域纳什策略是不同的。关键是根据任意控制输入和状态信息来描述 Q 因子。给出了一个数值实例来验证所提出算法的有效性。