Hu Yifan, Fu Junjie, Wen Guanghui
IEEE Trans Neural Netw Learn Syst. 2025 Jan;36(1):665-676. doi: 10.1109/TNNLS.2023.3329530. Epub 2025 Jan 7.
Learning distributed cooperative policies for large-scale multirobot systems remains a challenging task in the multiagent reinforcement learning (MARL) context. In this work, we model the interactions among the robots as a graph and propose a novel off-policy actor-critic MARL algorithm to train distributed coordination policies on the graph by leveraging the ability of information extraction of graph neural networks (GNNs). First, a new type of Gaussian policy parameterized by the GNNs is designed for distributed decision-making in continuous action spaces. Second, a scalable centralized value function network is designed based on a novel GNN-based value function decomposition technique. Then, based on the designed actor and the critic networks, a GNN-based MARL algorithm named graph soft actor-critic (G-SAC) is proposed and utilized to train the distributed policies in an effective and centralized fashion. Finally, two custom multirobot coordination environments are built, under which the simulation results are performed to empirically demonstrate both the sample efficiency and the scalability of G-SAC as well as the strong zero-shot generalization ability of the trained policy in large-scale multirobot coordination problems.
在多智能体强化学习(MARL)背景下,为大规模多机器人系统学习分布式协作策略仍然是一项具有挑战性的任务。在这项工作中,我们将机器人之间的交互建模为一个图,并提出了一种新颖的离策略演员-评论家MARL算法,通过利用图神经网络(GNN)的信息提取能力在图上训练分布式协调策略。首先,设计了一种由GNN参数化的新型高斯策略,用于连续动作空间中的分布式决策。其次,基于一种新颖的基于GNN的值函数分解技术,设计了一个可扩展的集中式值函数网络。然后,基于设计的演员和评论家网络,提出了一种基于GNN的MARL算法,称为图软演员-评论家(G-SAC),并用于以有效和集中的方式训练分布式策略。最后,构建了两个定制的多机器人协调环境,在其下进行仿真结果,以实证证明G-SAC的样本效率和可扩展性,以及训练得到的策略在大规模多机器人协调问题中的强大零样本泛化能力。