• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

具有未知漂移动态的非零和博弈的基于数据的强化学习

Data-Based Reinforcement Learning for Nonzero-Sum Games With Unknown Drift Dynamics.

作者信息

Zhang Qichao, Zhao Dongbin

出版信息

IEEE Trans Cybern. 2019 Aug;49(8):2874-2885. doi: 10.1109/TCYB.2018.2830820. Epub 2018 May 16.

DOI:10.1109/TCYB.2018.2830820
PMID:29994780
Abstract

This paper is concerned about the nonlinear optimization problem of nonzero-sum (NZS) games with unknown drift dynamics. The data-based integral reinforcement learning (IRL) method is proposed to approximate the Nash equilibrium of NZS games iteratively. Furthermore, we prove that the data-based IRL method is equivalent to the model-based policy iteration algorithm, which guarantees the convergence of the proposed method. For the implementation purpose, a single-critic neural network structure for the NZS games is given. To enhance the application capability of the data-based IRL method, we design the updating laws of critic weights based on the offline and online iterative learning methods, respectively. Note that the experience replay technique is introduced in the online iterative learning, which can improve the convergence rate of critic weights during the learning process. The uniform ultimate boundedness of the critic weights are guaranteed using the Lyapunov method. Finally, the numerical results demonstrate the effectiveness of the data-based IRL algorithm for nonlinear NZS games with unknown drift dynamics.

摘要

本文关注具有未知漂移动力学的非零和(NZS)博弈的非线性优化问题。提出了基于数据的积分强化学习(IRL)方法来迭代逼近NZS博弈的纳什均衡。此外,我们证明了基于数据的IRL方法等同于基于模型的策略迭代算法,这保证了所提方法的收敛性。出于实现目的,给出了用于NZS博弈的单批评神经网络结构。为提高基于数据的IRL方法的应用能力,我们分别基于离线和在线迭代学习方法设计了批评家权重的更新律。注意,在线迭代学习中引入了经验回放技术,这可以在学习过程中提高批评家权重的收敛速度。使用李雅普诺夫方法保证了批评家权重的一致最终有界性。最后,数值结果证明了基于数据的IRL算法对于具有未知漂移动力学的非线性NZS博弈的有效性。

相似文献

1
Data-Based Reinforcement Learning for Nonzero-Sum Games With Unknown Drift Dynamics.具有未知漂移动态的非零和博弈的基于数据的强化学习
IEEE Trans Cybern. 2019 Aug;49(8):2874-2885. doi: 10.1109/TCYB.2018.2830820. Epub 2018 May 16.
2
Off-Policy Integral Reinforcement Learning Method to Solve Nonlinear Continuous-Time Multiplayer Nonzero-Sum Games.基于非策略积分的强化学习方法求解非线性连续时间多人非零和博弈
IEEE Trans Neural Netw Learn Syst. 2017 Mar;28(3):704-713. doi: 10.1109/TNNLS.2016.2582849. Epub 2016 Jul 20.
3
Experience Replay for Optimal Control of Nonzero-Sum Game Systems With Unknown Dynamics.具有未知动态的非零和博弈系统最优控制的经验回放。
IEEE Trans Cybern. 2016 Mar;46(3):854-65. doi: 10.1109/TCYB.2015.2488680. Epub 2015 Oct 26.
4
Event-Triggered ADP for Nonzero-Sum Games of Unknown Nonlinear Systems.未知非线性系统非零和博弈的事件触发自适应动态规划
IEEE Trans Neural Netw Learn Syst. 2022 May;33(5):1905-1913. doi: 10.1109/TNNLS.2021.3071545. Epub 2022 May 2.
5
Event-triggered integral reinforcement learning for nonzero-sum games with asymmetric input saturation.具有非零和博弈的事件触发积分强化学习与非对称输入饱和
Neural Netw. 2022 Aug;152:212-223. doi: 10.1016/j.neunet.2022.04.013. Epub 2022 Apr 21.
6
Discrete-Time Nonzero-Sum Games for Multiplayer Using Policy-Iteration-Based Adaptive Dynamic Programming Algorithms.基于策略迭代的自适应动态规划算法的多人非零和离散时间博弈。
IEEE Trans Cybern. 2017 Oct;47(10):3331-3340. doi: 10.1109/TCYB.2016.2611613. Epub 2016 Oct 3.
7
Near-Optimal Control for Nonzero-Sum Differential Games of Continuous-Time Nonlinear Systems Using Single-Network ADP.基于单神经网络 ADP 的连续时间非线性系统非零和微分对策的近最优控制
IEEE Trans Cybern. 2013 Feb;43(1):206-16. doi: 10.1109/TSMCB.2012.2203336. Epub 2012 Jun 28.
8
Integral Reinforcement-Learning-Based Optimal Containment Control for Partially Unknown Nonlinear Multiagent Systems.基于积分强化学习的部分未知非线性多智能体系统最优遏制控制
Entropy (Basel). 2023 Jan 23;25(2):221. doi: 10.3390/e25020221.
9
Approximate Optimal Distributed Control of Nonlinear Interconnected Systems Using Event-Triggered Nonzero-Sum Games.基于事件触发非零和博弈的非线性互联系统近似最优分布式控制
IEEE Trans Neural Netw Learn Syst. 2019 May;30(5):1512-1522. doi: 10.1109/TNNLS.2018.2869896. Epub 2018 Oct 8.
10
Discrete-Time Non-Zero-Sum Games With Completely Unknown Dynamics.具有完全未知动态的离散时间非零和博弈
IEEE Trans Cybern. 2021 Jun;51(6):2929-2943. doi: 10.1109/TCYB.2019.2957406. Epub 2021 May 18.