有限时域非零和差分博弈的反馈纳什策略的 Q 学习。

Q-Learning for Feedback Nash Strategy of Finite-Horizon Nonzero-Sum Difference Games.

出版信息

IEEE Trans Cybern. 2022 Sep;52(9):9170-9178. doi: 10.1109/TCYB.2021.3052832. Epub 2022 Aug 18.

DOI:10.1109/TCYB.2021.3052832

Abstract

In this article, we study the feedback Nash strategy of the model-free nonzero-sum difference game. The main contribution is to present the Q -learning algorithm for the linear quadratic game without prior knowledge of the system model. It is noted that the studied game is in finite horizon which is novel to the learning algorithms in the literature which are mostly for the infinite-horizon Nash strategy. The key is to characterize the Q -factors in terms of the arbitrary control input and state information. A numerical example is given to verify the effectiveness of the proposed algorithm.

摘要

在本文中，我们研究了模型自由非零和差分博弈的反馈纳什策略。主要贡献在于提出了线性二次无模型系统模型先验知识的 Q-学习算法。需要注意的是，所研究的博弈是有限时域的，这与文献中学习算法的无限时域纳什策略是不同的。关键是根据任意控制输入和状态信息来描述 Q 因子。给出了一个数值实例来验证所提出算法的有效性。

相似文献

Q-Learning for Feedback Nash Strategy of Finite-Horizon Nonzero-Sum Difference Games.有限时域非零和差分博弈的反馈纳什策略的 Q 学习。

IEEE Trans Cybern. 2022 Sep;52(9):9170-9178. doi: 10.1109/TCYB.2021.3052832. Epub 2022 Aug 18.

Event-triggered integral reinforcement learning for nonzero-sum games with asymmetric input saturation.具有非零和博弈的事件触发积分强化学习与非对称输入饱和

Neural Netw. 2022 Aug;152:212-223. doi: 10.1016/j.neunet.2022.04.013. Epub 2022 Apr 21.

Output Feedback Q-Learning for Linear-Quadratic Discrete-Time Finite-Horizon Control Problems.线性二次离散时间有限时域控制问题的输出反馈Q学习

IEEE Trans Neural Netw Learn Syst. 2021 Jul;32(7):3274-3281. doi: 10.1109/TNNLS.2020.3010304. Epub 2021 Jul 6.

PAC Reinforcement Learning Algorithm for General-Sum Markov Games.用于一般和马尔可夫博弈的PAC强化学习算法

IEEE Trans Automat Contr. 2023 May;68(5):2821-2831. doi: 10.1109/tac.2022.3219340. Epub 2022 Nov 3.

Learning Robust Predictive Control: A Spatial-Temporal Game Theoretic Approach.

IEEE Trans Neural Netw Learn Syst. 2025 Feb;36(2):2869-2880. doi: 10.1109/TNNLS.2024.3357238. Epub 2025 Feb 6.

Discrete-Time Nonzero-Sum Games for Multiplayer Using Policy-Iteration-Based Adaptive Dynamic Programming Algorithms.基于策略迭代的自适应动态规划算法的多人非零和离散时间博弈。

IEEE Trans Cybern. 2017 Oct;47(10):3331-3340. doi: 10.1109/TCYB.2016.2611613. Epub 2016 Oct 3.

Approximate Optimal Distributed Control of Nonlinear Interconnected Systems Using Event-Triggered Nonzero-Sum Games.基于事件触发非零和博弈的非线性互联系统近似最优分布式控制

IEEE Trans Neural Netw Learn Syst. 2019 May;30(5):1512-1522. doi: 10.1109/TNNLS.2018.2869896. Epub 2018 Oct 8.

Nonzero-Sum Game Reinforcement Learning for Performance Optimization in Large-Scale Industrial Processes.非零和博弈强化学习在大规模工业过程中的性能优化。

IEEE Trans Cybern. 2020 Sep;50(9):4132-4145. doi: 10.1109/TCYB.2019.2950262. Epub 2019 Nov 19.

Aperiodic adaptive control for neural-network-based nonzero-sum differential games: A novel event-triggering strategy.基于神经网络的非零和微分对策的非周期自适应控制：一种新的事件触发策略。

ISA Trans. 2019 Sep;92:1-13. doi: 10.1016/j.isatra.2019.01.025. Epub 2019 Jan 25.

Optimal Regulation Strategy for Nonzero-Sum Games of the Immune System Using Adaptive Dynamic Programming.基于自适应动态规划的免疫系统非零和博弈最优调控策略

IEEE Trans Cybern. 2023 Mar;53(3):1475-1484. doi: 10.1109/TCYB.2021.3103820. Epub 2023 Feb 15.

引用本文的文献

Anesthesia decision analysis using a cloud-based big data platform.基于云的大数据平台的麻醉决策分析。

Eur J Med Res. 2024 Mar 25;29(1):201. doi: 10.1186/s40001-024-01764-0.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

有限时域非零和差分博弈的反馈纳什策略的 Q 学习。

Q-Learning for Feedback Nash Strategy of Finite-Horizon Nonzero-Sum Difference Games.

出版信息

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献