Liu Qiwei, Yan Huaicheng, Zhang Hao, Wang Meng, Tian Yongxiao
IEEE Trans Cybern. 2024 Dec;54(12):7865-7876. doi: 10.1109/TCYB.2024.3419056. Epub 2024 Nov 27.
In this article, a novel model-free policy gradient reinforcement learning algorithm is proposed to solve the tracking problem for discrete-time heterogeneous multiagent systems with external disturbances over switching topology. The dynamics of the followers and the leader are unknown, and the leader's information is missing for each agent due to the switching topology. Therefore, a distributed adaptive observer is introduced to learn the leader's dynamic model and estimate its state for each agent. For the tracking problem, an exponential discount value function is established and the related discrete-time game algebraic Riccati equation (DTGARE) is derived, which is the key to obtaining the control strategy. Furthermore, a data-based policy gradient algorithm is proposed to approximate the solution of the GAREs online and the utilization of agents' accurate knowledge is avoided. To improve the efficiency of data utilization, an offline dataset and the experience replay scheme are used. In addition, the lower bound of the exponential discount value is explored to ensure the stability of the systems. In the end, a simulation is provided to show the validity of the proposed method.
本文提出了一种新颖的无模型策略梯度强化学习算法,以解决具有外部干扰的离散时间异构多智能体系统在切换拓扑下的跟踪问题。跟随者和领导者的动态特性未知,并且由于切换拓扑,每个智能体无法获取领导者的信息。因此,引入了分布式自适应观测器来学习领导者的动态模型并估计每个智能体的状态。针对跟踪问题,建立了指数折扣值函数并推导了相关的离散时间博弈代数黎卡提方程(DTGARE),这是获得控制策略的关键。此外,提出了一种基于数据的策略梯度算法来在线逼近GAREs的解,避免了对智能体精确知识的依赖。为了提高数据利用效率,使用了离线数据集和经验回放策略。此外,研究了指数折扣值的下界以确保系统的稳定性。最后,通过仿真验证了所提方法的有效性。