Suppr超能文献

基于在线数据的求解未知非线性零和博弈的迭代自适应动态规划

Iterative Adaptive Dynamic Programming for Solving Unknown Nonlinear Zero-Sum Game Based on Online Data.

出版信息

IEEE Trans Neural Netw Learn Syst. 2017 Mar;28(3):714-725. doi: 10.1109/TNNLS.2016.2561300. Epub 2016 May 27.

Abstract

H control is a powerful method to solve the disturbance attenuation problems that occur in some control systems. The design of such controllers relies on solving the zero-sum game (ZSG). But in practical applications, the exact dynamics is mostly unknown. Identification of dynamics also produces errors that are detrimental to the control performance. To overcome this problem, an iterative adaptive dynamic programming algorithm is proposed in this paper to solve the continuous-time, unknown nonlinear ZSG with only online data. A model-free approach to the Hamilton-Jacobi-Isaacs equation is developed based on the policy iteration method. Control and disturbance policies and value are approximated by neural networks (NNs) under the critic-actor-disturber structure. The NN weights are solved by the least-squares method. According to the theoretical analysis, our algorithm is equivalent to a Gauss-Newton method solving an optimization problem, and it converges uniformly to the optimal solution. The online data can also be used repeatedly, which is highly efficient. Simulation results demonstrate its feasibility to solve the unknown nonlinear ZSG. When compared with other algorithms, it saves a significant amount of online measurement time.

摘要

H 控制是解决某些控制系统中出现的干扰衰减问题的一种强大方法。此类控制器的设计依赖于求解零和博弈(ZSG)。但在实际应用中,精确的动力学通常是未知的。动态识别也会产生不利于控制性能的误差。为了克服这个问题,本文提出了一种迭代自适应动态规划算法,用于仅使用在线数据解决连续时间、未知非线性 ZSG。基于策略迭代方法,开发了一种无模型的 Hamilton-Jacobi-Isaacs 方程方法。在批评者-演员-干扰者结构下,通过神经网络(NN)逼近控制和干扰策略以及价值。NN 权重通过最小二乘法求解。根据理论分析,我们的算法等效于求解优化问题的高斯-牛顿法,并且它一致收敛到最优解。在线数据也可以重复使用,效率很高。仿真结果表明了它求解未知非线性 ZSG 的可行性。与其他算法相比,它节省了大量的在线测量时间。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验