Suppr超能文献

有限时域非零和差分博弈的反馈纳什策略的 Q 学习。

Q-Learning for Feedback Nash Strategy of Finite-Horizon Nonzero-Sum Difference Games.

出版信息

IEEE Trans Cybern. 2022 Sep;52(9):9170-9178. doi: 10.1109/TCYB.2021.3052832. Epub 2022 Aug 18.

Abstract

In this article, we study the feedback Nash strategy of the model-free nonzero-sum difference game. The main contribution is to present the Q -learning algorithm for the linear quadratic game without prior knowledge of the system model. It is noted that the studied game is in finite horizon which is novel to the learning algorithms in the literature which are mostly for the infinite-horizon Nash strategy. The key is to characterize the Q -factors in terms of the arbitrary control input and state information. A numerical example is given to verify the effectiveness of the proposed algorithm.

摘要

在本文中,我们研究了模型自由非零和差分博弈的反馈纳什策略。主要贡献在于提出了线性二次无模型系统模型先验知识的 Q-学习算法。需要注意的是,所研究的博弈是有限时域的,这与文献中学习算法的无限时域纳什策略是不同的。关键是根据任意控制输入和状态信息来描述 Q 因子。给出了一个数值实例来验证所提出算法的有效性。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验