• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于策略迭代的自适应动态规划算法的多人非零和离散时间博弈。

Discrete-Time Nonzero-Sum Games for Multiplayer Using Policy-Iteration-Based Adaptive Dynamic Programming Algorithms.

出版信息

IEEE Trans Cybern. 2017 Oct;47(10):3331-3340. doi: 10.1109/TCYB.2016.2611613. Epub 2016 Oct 3.

DOI:10.1109/TCYB.2016.2611613
PMID:28113535
Abstract

In this paper, we investigate the nonzero-sum games for a class of discrete-time (DT) nonlinear systems by using a novel policy iteration (PI) adaptive dynamic programming (ADP) method. The main idea of our proposed PI scheme is to utilize the iterative ADP algorithm to obtain the iterative control policies, which not only ensure the system to achieve stability but also minimize the performance index function for each player. This paper integrates game theory, optimal control theory, and reinforcement learning technique to formulate and handle the DT nonzero-sum games for multiplayer. First, we design three actor-critic algorithms, an offline one and two online ones, for the PI scheme. Subsequently, neural networks are employed to implement these algorithms and the corresponding stability analysis is also provided via the Lyapunov theory. Finally, a numerical simulation example is presented to demonstrate the effectiveness of our proposed approach.

摘要

在本文中,我们通过使用一种新颖的策略迭代(PI)自适应动态规划(ADP)方法来研究一类离散时间(DT)非线性系统的非零和博弈。我们提出的 PI 方案的主要思想是利用迭代 ADP 算法来获得迭代控制策略,这不仅可以确保系统达到稳定,而且可以为每个参与者最小化性能指标函数。本文将博弈论、最优控制理论和强化学习技术集成在一起,用于对多人的 DT 非零和博弈进行建模和处理。首先,我们为 PI 方案设计了三个演员-评论家算法,一个离线的和两个在线的。随后,我们使用神经网络来实现这些算法,并通过 Lyapunov 理论提供相应的稳定性分析。最后,通过数值仿真示例验证了所提出方法的有效性。

相似文献

1
Discrete-Time Nonzero-Sum Games for Multiplayer Using Policy-Iteration-Based Adaptive Dynamic Programming Algorithms.基于策略迭代的自适应动态规划算法的多人非零和离散时间博弈。
IEEE Trans Cybern. 2017 Oct;47(10):3331-3340. doi: 10.1109/TCYB.2016.2611613. Epub 2016 Oct 3.
2
Near-Optimal Control for Nonzero-Sum Differential Games of Continuous-Time Nonlinear Systems Using Single-Network ADP.基于单神经网络 ADP 的连续时间非线性系统非零和微分对策的近最优控制
IEEE Trans Cybern. 2013 Feb;43(1):206-16. doi: 10.1109/TSMCB.2012.2203336. Epub 2012 Jun 28.
3
Off-Policy Integral Reinforcement Learning Method to Solve Nonlinear Continuous-Time Multiplayer Nonzero-Sum Games.基于非策略积分的强化学习方法求解非线性连续时间多人非零和博弈
IEEE Trans Neural Netw Learn Syst. 2017 Mar;28(3):704-713. doi: 10.1109/TNNLS.2016.2582849. Epub 2016 Jul 20.
4
Asynchronous learning for actor-critic neural networks and synchronous triggering for multiplayer system.异步学习的演员-批评神经网络和同步触发的多人系统。
ISA Trans. 2022 Oct;129(Pt B):295-308. doi: 10.1016/j.isatra.2022.02.007. Epub 2022 Feb 10.
5
Decentralized Event-Triggered Adaptive Control of Discrete-Time Nonzero-Sum Games Over Wireless Sensor-Actuator Networks With Input Constraints.具有输入约束的无线传感器-执行器网络上离散时间非零和博弈的分布式事件触发自适应控制
IEEE Trans Neural Netw Learn Syst. 2020 Oct;31(10):4254-4266. doi: 10.1109/TNNLS.2019.2953613. Epub 2020 Jan 13.
6
Infinite horizon self-learning optimal control of nonaffine discrete-time nonlinear systems.非仿射离散时间非线性系统的无限时域自学习最优控制。
IEEE Trans Neural Netw Learn Syst. 2015 Apr;26(4):866-79. doi: 10.1109/TNNLS.2015.2401334. Epub 2015 Mar 2.
7
Data-Based Reinforcement Learning for Nonzero-Sum Games With Unknown Drift Dynamics.具有未知漂移动态的非零和博弈的基于数据的强化学习
IEEE Trans Cybern. 2019 Aug;49(8):2874-2885. doi: 10.1109/TCYB.2018.2830820. Epub 2018 May 16.
8
Policy iteration adaptive dynamic programming algorithm for discrete-time nonlinear systems.策略迭代自适应动态规划算法用于离散时间非线性系统。
IEEE Trans Neural Netw Learn Syst. 2014 Mar;25(3):621-34. doi: 10.1109/TNNLS.2013.2281663.
9
Model-Free Adaptive Optimal Control for Unknown Nonlinear Multiplayer Nonzero-Sum Game.未知非线性多人非零和博弈的无模型自适应最优控制
IEEE Trans Neural Netw Learn Syst. 2022 Feb;33(2):879-892. doi: 10.1109/TNNLS.2020.3030127. Epub 2022 Feb 3.
10
Experience Replay for Optimal Control of Nonzero-Sum Game Systems With Unknown Dynamics.具有未知动态的非零和博弈系统最优控制的经验回放。
IEEE Trans Cybern. 2016 Mar;46(3):854-65. doi: 10.1109/TCYB.2015.2488680. Epub 2015 Oct 26.

引用本文的文献

1
Optimal Robust Control of Nonlinear Systems with Unknown Dynamics via NN Learning with Relaxed Excitation.基于松弛激励的神经网络学习实现对未知动态非线性系统的最优鲁棒控制
Entropy (Basel). 2024 Jan 14;26(1):0. doi: 10.3390/e26010072.