IEEE Trans Cybern. 2018 Nov;48(11):3197-3207. doi: 10.1109/TCYB.2017.2761878. Epub 2017 Oct 30.
This paper investigates optimal robust output containment problem of general linear heterogeneous multiagent systems (MAS) with completely unknown dynamics. A model-based algorithm using offline policy iteration (PI) is first developed, where the -copy internal model principle is utilized to address the system parameter variations. This offline PI algorithm requires the nominal model of each agent, which may not be available in most real-world applications. To address this issue, a discounted performance function is introduced to express the optimal robust output containment problem as an optimal output-feedback design problem with bounded -gain. To solve this problem online in real time, a Bellman equation is first developed to evaluate a certain control policy and find the updated control policies, simultaneously, using only the state/output information measured online. Then, using this Bellman equation, a model-free off-policy integral reinforcement learning algorithm is proposed to solve the optimal robust output containment problem of heterogeneous MAS, in real time, without requiring any knowledge of the system dynamics. Simulation results are provided to verify the effectiveness of the proposed method.
本文研究了完全未知动态的广义线性异类多智能体系统(MAS)的最优鲁棒输出包容问题。首先开发了一种基于模型的离线策略迭代(PI)算法,其中利用 -copy 内部模型原理来解决系统参数变化的问题。该离线 PI 算法需要每个代理的标称模型,但在大多数实际应用中可能无法获得。为了解决这个问题,引入了折扣性能函数,将最优鲁棒输出包容问题表示为具有有界 -gain 的最优输出反馈设计问题。为了在线实时解决这个问题,首先开发了一个贝尔曼方程来评估某个控制策略,并找到更新的控制策略,同时仅使用在线测量的状态/输出信息。然后,使用这个贝尔曼方程,提出了一种无模型的离线策略积分强化学习算法,以实时解决异类 MAS 的最优鲁棒输出包容问题,而无需任何系统动力学知识。提供了仿真结果以验证所提出方法的有效性。