用于大规模分布式多机器人协调的图软演员-评论家强化学习

Graph Soft Actor-Critic Reinforcement Learning for Large-Scale Distributed Multirobot Coordination.

作者信息

Hu Yifan, Fu Junjie, Wen Guanghui

出版信息

IEEE Trans Neural Netw Learn Syst. 2025 Jan;36(1):665-676. doi: 10.1109/TNNLS.2023.3329530. Epub 2025 Jan 7.

DOI:10.1109/TNNLS.2023.3329530

Abstract

Learning distributed cooperative policies for large-scale multirobot systems remains a challenging task in the multiagent reinforcement learning (MARL) context. In this work, we model the interactions among the robots as a graph and propose a novel off-policy actor-critic MARL algorithm to train distributed coordination policies on the graph by leveraging the ability of information extraction of graph neural networks (GNNs). First, a new type of Gaussian policy parameterized by the GNNs is designed for distributed decision-making in continuous action spaces. Second, a scalable centralized value function network is designed based on a novel GNN-based value function decomposition technique. Then, based on the designed actor and the critic networks, a GNN-based MARL algorithm named graph soft actor-critic (G-SAC) is proposed and utilized to train the distributed policies in an effective and centralized fashion. Finally, two custom multirobot coordination environments are built, under which the simulation results are performed to empirically demonstrate both the sample efficiency and the scalability of G-SAC as well as the strong zero-shot generalization ability of the trained policy in large-scale multirobot coordination problems.

摘要

在多智能体强化学习（MARL）背景下，为大规模多机器人系统学习分布式协作策略仍然是一项具有挑战性的任务。在这项工作中，我们将机器人之间的交互建模为一个图，并提出了一种新颖的离策略演员-评论家MARL算法，通过利用图神经网络（GNN）的信息提取能力在图上训练分布式协调策略。首先，设计了一种由GNN参数化的新型高斯策略，用于连续动作空间中的分布式决策。其次，基于一种新颖的基于GNN的值函数分解技术，设计了一个可扩展的集中式值函数网络。然后，基于设计的演员和评论家网络，提出了一种基于GNN的MARL算法，称为图软演员-评论家（G-SAC），并用于以有效和集中的方式训练分布式策略。最后，构建了两个定制的多机器人协调环境，在其下进行仿真结果，以实证证明G-SAC的样本效率和可扩展性，以及训练得到的策略在大规模多机器人协调问题中的强大零样本泛化能力。

相似文献

Graph Soft Actor-Critic Reinforcement Learning for Large-Scale Distributed Multirobot Coordination.

IEEE Trans Neural Netw Learn Syst. 2025 Jan;36(1):665-676. doi: 10.1109/TNNLS.2023.3329530. Epub 2025 Jan 7.

Distributed Actor-Critic Algorithms for Multiagent Reinforcement Learning Over Directed Graphs.

IEEE Trans Neural Netw Learn Syst. 2023 Oct;34(10):7210-7221. doi: 10.1109/TNNLS.2021.3139138. Epub 2023 Oct 5.

Improved Soft Actor-Critic: Mixing Prioritized Off-Policy Samples With On-Policy Experiences.

IEEE Trans Neural Netw Learn Syst. 2024 Mar;35(3):3121-3129. doi: 10.1109/TNNLS.2022.3174051. Epub 2024 Feb 29.

Multiagent Reinforcement Learning With Graphical Mutual Information Maximization.

IEEE Trans Neural Netw Learn Syst. 2023 Feb 16;PP. doi: 10.1109/TNNLS.2023.3243557.

IHG-MA: Inductive heterogeneous graph multi-agent reinforcement learning for multi-intersection traffic signal control.

Neural Netw. 2021 Jul;139:265-277. doi: 10.1016/j.neunet.2021.03.015. Epub 2021 Mar 22.

A Local-and-Global Attention Reinforcement Learning Algorithm for Multiagent Cooperative Navigation.

IEEE Trans Neural Netw Learn Syst. 2024 Jun;35(6):7767-7777. doi: 10.1109/TNNLS.2022.3220798. Epub 2024 Jun 3.

Meta attention for Off-Policy Actor-Critic.

Neural Netw. 2023 Jun;163:86-96. doi: 10.1016/j.neunet.2023.03.024. Epub 2023 Mar 28.

Decentralized multi-agent reinforcement learning based on best-response policies.

Front Robot AI. 2024 Apr 16;11:1229026. doi: 10.3389/frobt.2024.1229026. eCollection 2024.

Multiagent Soft Actor-Critic Based Hybrid Motion Planner for Mobile Robots.

IEEE Trans Neural Netw Learn Syst. 2023 Dec;34(12):10980-10992. doi: 10.1109/TNNLS.2022.3172168. Epub 2023 Nov 30.

Stochastic Integrated Actor-Critic for Deep Reinforcement Learning.

IEEE Trans Neural Netw Learn Syst. 2024 May;35(5):6654-6666. doi: 10.1109/TNNLS.2022.3212273. Epub 2024 May 2.

引用本文的文献

Reinforcement learning-based dynamic field exploration and reconstruction using multi-robot systems for environmental monitoring.

Front Robot AI. 2025 Mar 25;12:1492526. doi: 10.3389/frobt.2025.1492526. eCollection 2025.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

用于大规模分布式多机器人协调的图软演员-评论家强化学习

Graph Soft Actor-Critic Reinforcement Learning for Large-Scale Distributed Multirobot Coordination.

作者信息

出版信息

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献