School of Artificial Intelligence and Automation, Image Processing and Intelligent Control Key Laboratory of Education Ministry of China, Huazhong University of Science and Technology, Wuhan 430074, China.
School of Automation, Nanjing University of Posts and Telecommunications, Nanjing 210023, China.
Chaos. 2019 Oct;29(10):103127. doi: 10.1063/1.5120106.
This paper addresses the consensus problem of discrete-time multiagent systems (DTMASs), which are subject to input saturation and lack of the information of agent dynamics. In the previous works, the DTMASs with input saturation can achieve semiglobal consensus by utilizing the low gain feedback (LGF) method, but computing the LGF matrices by solving the modified algebraic Riccati equation requires the knowledge of agent dynamics. In this paper, motivated by the reinforcement learning method, we propose a model-free Q-learning algorithm to obtain the LGF matrices for the DTMASs achieving global consensus. Firstly, we define a Q-learning function and deduce a Q-learning Bellman equation, whose solution can work out the LGF matrix. Then, we develop an iterative Q-learning algorithm to obtain the LGF matrix without the requirement of the knowledge about agent dynamics. Moreover, the DTMASs can achieve global consensus. Lastly, some simulation results are proposed to validate the effectiveness of the Q-learning algorithm and show the effect on the rate of convergence from the initial states of agents and the input saturation limit.
本文针对存在输入饱和且缺乏个体动态信息的离散时间多智能体系统(DTMASs)的共识问题进行了研究。在之前的研究中,通过使用低增益反馈(LGF)方法,DTMASs 可以实现半全局共识,但通过求解修正的代数黎卡提方程来计算 LGF 矩阵需要个体动态的知识。受强化学习方法的启发,本文提出了一种无模型的 Q-learning 算法,用于获得实现全局共识的 DTMASs 的 LGF 矩阵。首先,我们定义了一个 Q-learning 函数,并推导出一个 Q-learning 贝尔曼方程,其解可以得出 LGF 矩阵。然后,我们开发了一种迭代 Q-learning 算法来获得 LGF 矩阵,而无需个体动态的知识。此外,DTMASs 可以实现全局共识。最后,提出了一些仿真结果来验证 Q-learning 算法的有效性,并展示了初始状态和输入饱和限制对收敛速度的影响。