555 Engineering North, Division of Engineering Technology, Oklahoma State University, Stillwater, OK 74078, United States of America.
Washington University, St. Louis, MO, United States of America.
Neural Netw. 2020 Apr;124:95-108. doi: 10.1016/j.neunet.2019.12.031. Epub 2020 Jan 14.
In this paper, we propose a novel differential-game based neural network (NN) control architecture to solve an optimal control problem for a class of large-scale nonlinear systems involving N-players. We focus on optimizing the usage of the computational resources along with the system performance simultaneously. In particular, the N-players' control policies are desired to be designed such that they cooperatively optimize the large-scale system performance, and the sampling intervals for each player are desired to reduce the frequency of feedback execution. To develop a unified design framework that achieves both these objectives, we propose an optimal control problem by integrating both the design requirements, which leads to a multi-player differential-game. A solution to this problem is numerically obtained by solving the associated Hamilton-Jacobi (HJ) equation using event-driven approximate dynamic programming (E-ADP) and artificial NNs online and forward-in-time. We employ the critic neural networks to approximate the solution to the HJ equation, i.e., the optimal value function, with aperiodically available feedback information. Using the NN approximated value function, we design the control policies and the sampling schemes. Finally, the event-driven N-player system is remodeled as a hybrid dynamical system with impulsive weight update rules for analyzing its stability and convergence properties. The closed-loop practical stability of the system and Zeno free behavior of the sampling scheme are demonstrated using the Lyapunov method. Simulation results using a numerical example are also included to substantiate the analytical results.
在本文中,我们提出了一种新的基于微分博弈的神经网络(NN)控制架构,用于解决一类涉及 N 个参与者的大规模非线性系统的最优控制问题。我们专注于同时优化计算资源的使用和系统性能。特别是,希望设计 N 个参与者的控制策略,使它们能够协作地优化大规模系统性能,并且希望每个参与者的采样间隔减少反馈执行的频率。为了开发一个同时实现这两个目标的统一设计框架,我们通过集成设计要求来提出一个最优控制问题,这导致了一个多玩家微分博弈。通过使用事件驱动近似动态规划(E-ADP)和人工神经网络在线和正向时间求解相关的哈密顿-雅可比(HJ)方程,数值上得到了这个问题的解。我们使用评论家神经网络来近似 HJ 方程的解,即最优值函数,利用不定期可用的反馈信息。利用 NN 近似值函数,我们设计了控制策略和采样方案。最后,将事件驱动的 N 个玩家系统建模为具有脉冲权重更新规则的混合动态系统,用于分析其稳定性和收敛性。使用李雅普诺夫方法证明了系统的闭环实际稳定性和采样方案的零阻尼行为。还包括使用数值示例的仿真结果来证实分析结果。