Xiong Chunping, Ma Qian, Guo Jian, Lewis Frank L
IEEE Trans Neural Netw Learn Syst. 2024 Nov;35(11):15984-15992. doi: 10.1109/TNNLS.2023.3291542. Epub 2024 Oct 29.
This article studies the optimal synchronization of linear heterogeneous multiagent systems (MASs) with partial unknown knowledge of the system dynamics. The object is to realize system synchronization as well as minimize the performance index of each agent. A framework of heterogeneous multiagent graphical games is formulated first. In the graphical games, it is proved that the optimal control policy relying on the solution of the Hamilton-Jacobian-Bellmen (HJB) equation is not only in Nash equilibrium, but also the best response to fixed control policies of its neighbors. To solve the optimal control policy and the minimum value of the performance index, a model-based policy iteration (PI) algorithm is proposed. Then, according to the model-based algorithm, a data-based off-policy integral reinforcement learning (IRL) algorithm is put forward to handle the partially unknown system dynamics. Furthermore, a single-critic neural network (NN) structure is used to implement the data-based algorithm. Based on the data collected by the behavior policy of the data-based off-policy algorithm, the gradient descent method is used to train NNs to approach the ideal weights. In addition, it is proved that all the proposed algorithms are convergent, and the weight-tuning law of the single-critic NNs can promote optimal synchronization. Finally, a numerical example is proposed to show the effectiveness of the theoretical analysis.
本文研究了在系统动力学部分未知的情况下线性异构多智能体系统(MAS)的最优同步问题。目标是实现系统同步,并使每个智能体的性能指标最小化。首先构建了一个异构多智能体图形博弈框架。在图形博弈中,证明了依赖于汉密尔顿 - 雅可比 - 贝尔曼(HJB)方程解的最优控制策略不仅处于纳什均衡,而且是对其邻居固定控制策略的最佳响应。为了解决最优控制策略和性能指标的最小值问题,提出了一种基于模型的策略迭代(PI)算法。然后,根据基于模型的算法,提出了一种基于数据的离策略积分强化学习(IRL)算法来处理部分未知的系统动力学。此外,使用单批评神经网络(NN)结构来实现基于数据的算法。基于基于数据的离策略算法的行为策略收集的数据,采用梯度下降法训练神经网络以逼近理想权重。另外,证明了所有提出的算法都是收敛的,并且单批评神经网络的权重调整律可以促进最优同步。最后,给出了一个数值例子来说明理论分析的有效性。