Suppr超能文献

不确定连续非线性系统的近似 N 人非零和博弈解。

Approximate N-Player Nonzero-Sum Game Solution for an Uncertain Continuous Nonlinear System.

出版信息

IEEE Trans Neural Netw Learn Syst. 2015 Aug;26(8):1645-58. doi: 10.1109/TNNLS.2014.2350835. Epub 2014 Oct 8.

Abstract

An approximate online equilibrium solution is developed for an N -player nonzero-sum game subject to continuous-time nonlinear unknown dynamics and an infinite horizon quadratic cost. A novel actor-critic-identifier structure is used, wherein a robust dynamic neural network is used to asymptotically identify the uncertain system with additive disturbances, and a set of critic and actor NNs are used to approximate the value functions and equilibrium policies, respectively. The weight update laws for the actor neural networks (NNs) are generated using a gradient-descent method, and the critic NNs are generated by least square regression, which are both based on the modified Bellman error that is independent of the system dynamics. A Lyapunov-based stability analysis shows that uniformly ultimately bounded tracking is achieved, and a convergence analysis demonstrates that the approximate control policies converge to a neighborhood of the optimal solutions. The actor, critic, and identifier structures are implemented in real time continuously and simultaneously. Simulations on two and three player games illustrate the performance of the developed method.

摘要

针对具有连续时间非线性未知动态和无限时域二次成本的 N 人非零和博弈,提出了一种近似在线平衡解。使用了一种新的演员-评论家-识别器结构,其中使用鲁棒动态神经网络渐近地识别具有加性干扰的不确定系统,并且使用一组评论家神经网络和演员神经网络分别近似值函数和平衡策略。演员神经网络(NN)的权重更新律是使用梯度下降法生成的,而评论家神经网络是通过最小二乘回归生成的,这两种方法都是基于与系统动态无关的修正贝尔曼误差。基于 Lyapunov 的稳定性分析表明,实现了一致最终有界跟踪,并且收敛性分析表明,近似控制策略收敛到最优解的邻域。演员、评论家、识别器结构在实时连续和同时实现。在两个和三个玩家的游戏中的仿真说明了所开发方法的性能。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验