• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

不确定连续非线性系统的近似 N 人非零和博弈解。

Approximate N-Player Nonzero-Sum Game Solution for an Uncertain Continuous Nonlinear System.

出版信息

IEEE Trans Neural Netw Learn Syst. 2015 Aug;26(8):1645-58. doi: 10.1109/TNNLS.2014.2350835. Epub 2014 Oct 8.

DOI:10.1109/TNNLS.2014.2350835
PMID:25312943
Abstract

An approximate online equilibrium solution is developed for an N -player nonzero-sum game subject to continuous-time nonlinear unknown dynamics and an infinite horizon quadratic cost. A novel actor-critic-identifier structure is used, wherein a robust dynamic neural network is used to asymptotically identify the uncertain system with additive disturbances, and a set of critic and actor NNs are used to approximate the value functions and equilibrium policies, respectively. The weight update laws for the actor neural networks (NNs) are generated using a gradient-descent method, and the critic NNs are generated by least square regression, which are both based on the modified Bellman error that is independent of the system dynamics. A Lyapunov-based stability analysis shows that uniformly ultimately bounded tracking is achieved, and a convergence analysis demonstrates that the approximate control policies converge to a neighborhood of the optimal solutions. The actor, critic, and identifier structures are implemented in real time continuously and simultaneously. Simulations on two and three player games illustrate the performance of the developed method.

摘要

针对具有连续时间非线性未知动态和无限时域二次成本的 N 人非零和博弈,提出了一种近似在线平衡解。使用了一种新的演员-评论家-识别器结构,其中使用鲁棒动态神经网络渐近地识别具有加性干扰的不确定系统,并且使用一组评论家神经网络和演员神经网络分别近似值函数和平衡策略。演员神经网络(NN)的权重更新律是使用梯度下降法生成的,而评论家神经网络是通过最小二乘回归生成的,这两种方法都是基于与系统动态无关的修正贝尔曼误差。基于 Lyapunov 的稳定性分析表明,实现了一致最终有界跟踪,并且收敛性分析表明,近似控制策略收敛到最优解的邻域。演员、评论家、识别器结构在实时连续和同时实现。在两个和三个玩家的游戏中的仿真说明了所开发方法的性能。

相似文献

1
Approximate N-Player Nonzero-Sum Game Solution for an Uncertain Continuous Nonlinear System.不确定连续非线性系统的近似 N 人非零和博弈解。
IEEE Trans Neural Netw Learn Syst. 2015 Aug;26(8):1645-58. doi: 10.1109/TNNLS.2014.2350835. Epub 2014 Oct 8.
2
Asynchronous learning for actor-critic neural networks and synchronous triggering for multiplayer system.异步学习的演员-批评神经网络和同步触发的多人系统。
ISA Trans. 2022 Oct;129(Pt B):295-308. doi: 10.1016/j.isatra.2022.02.007. Epub 2022 Feb 10.
3
Adaptive optimal control of unknown constrained-input systems using policy iteration and neural networks.基于策略迭代和神经网络的未知约束输入系统自适应最优控制。
IEEE Trans Neural Netw Learn Syst. 2013 Oct;24(10):1513-25. doi: 10.1109/TNNLS.2013.2276571.
4
Multiple actor-critic structures for continuous-time optimal control using input-output data.基于输入输出数据的连续时间最优控制的多 Actor-Critic 结构。
IEEE Trans Neural Netw Learn Syst. 2015 Apr;26(4):851-65. doi: 10.1109/TNNLS.2015.2399020. Epub 2015 Feb 26.
5
Differential-game for resource aware approximate optimal control of large-scale nonlinear systems with multiple players.具有多玩家的大规模非线性系统资源感知近似最优控制的微分博弈。
Neural Netw. 2020 Apr;124:95-108. doi: 10.1016/j.neunet.2019.12.031. Epub 2020 Jan 14.
6
Adaptive Reinforcement Learning Neural Network Control for Uncertain Nonlinear System With Input Saturation.具有输入饱和的不确定非线性系统的自适应强化学习神经网络控制。
IEEE Trans Cybern. 2020 Aug;50(8):3433-3443. doi: 10.1109/TCYB.2019.2921057. Epub 2019 Jun 26.
7
Neural network-based finite-horizon optimal control of uncertain affine nonlinear discrete-time systems.基于神经网络的不确定仿射非线性离散时间系统有限时域最优控制。
IEEE Trans Neural Netw Learn Syst. 2015 Mar;26(3):486-99. doi: 10.1109/TNNLS.2014.2315646.
8
Experience Replay for Optimal Control of Nonzero-Sum Game Systems With Unknown Dynamics.具有未知动态的非零和博弈系统最优控制的经验回放。
IEEE Trans Cybern. 2016 Mar;46(3):854-65. doi: 10.1109/TCYB.2015.2488680. Epub 2015 Oct 26.
9
Advanced optimal tracking integrating a neural critic technique for asymmetric constrained zero-sum games.高级最优跟踪,整合神经批评技术,用于非对称约束零和博弈。
Neural Netw. 2024 Sep;177:106388. doi: 10.1016/j.neunet.2024.106388. Epub 2024 May 15.
10
A Nonlinear Finite-Time Robust Differential Game Guidance Law.一种非线性有限时间鲁棒微分对策制导律。
Sensors (Basel). 2022 Sep 2;22(17):6650. doi: 10.3390/s22176650.