Suppr超能文献

基于策略迭代的自适应动态规划算法的多人非零和离散时间博弈。

Discrete-Time Nonzero-Sum Games for Multiplayer Using Policy-Iteration-Based Adaptive Dynamic Programming Algorithms.

出版信息

IEEE Trans Cybern. 2017 Oct;47(10):3331-3340. doi: 10.1109/TCYB.2016.2611613. Epub 2016 Oct 3.

Abstract

In this paper, we investigate the nonzero-sum games for a class of discrete-time (DT) nonlinear systems by using a novel policy iteration (PI) adaptive dynamic programming (ADP) method. The main idea of our proposed PI scheme is to utilize the iterative ADP algorithm to obtain the iterative control policies, which not only ensure the system to achieve stability but also minimize the performance index function for each player. This paper integrates game theory, optimal control theory, and reinforcement learning technique to formulate and handle the DT nonzero-sum games for multiplayer. First, we design three actor-critic algorithms, an offline one and two online ones, for the PI scheme. Subsequently, neural networks are employed to implement these algorithms and the corresponding stability analysis is also provided via the Lyapunov theory. Finally, a numerical simulation example is presented to demonstrate the effectiveness of our proposed approach.

摘要

在本文中,我们通过使用一种新颖的策略迭代(PI)自适应动态规划(ADP)方法来研究一类离散时间(DT)非线性系统的非零和博弈。我们提出的 PI 方案的主要思想是利用迭代 ADP 算法来获得迭代控制策略,这不仅可以确保系统达到稳定,而且可以为每个参与者最小化性能指标函数。本文将博弈论、最优控制理论和强化学习技术集成在一起,用于对多人的 DT 非零和博弈进行建模和处理。首先,我们为 PI 方案设计了三个演员-评论家算法,一个离线的和两个在线的。随后,我们使用神经网络来实现这些算法,并通过 Lyapunov 理论提供相应的稳定性分析。最后,通过数值仿真示例验证了所提出方法的有效性。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验