Xie Donghan, Wang Zhi, Chen Chunlin, Dong Daoyi
IEEE Trans Neural Netw Learn Syst. 2024 Jun;35(6):8557-8569. doi: 10.1109/TNNLS.2022.3230701. Epub 2024 Jun 3.
Multi-Agent settings remain a fundamental challenge in the reinforcement learning (RL) domain due to the partial observability and the lack of accurate real-time interactions across agents. In this article, we propose a new method based on local communication learning to tackle the multi-agent RL (MARL) challenge within a large number of agents coexisting. First, we design a new communication protocol that exploits the ability of depthwise convolution to efficiently extract local relations and learn local communication between neighboring agents. To facilitate multi-agent coordination, we explicitly learn the effect of joint actions by taking the policies of neighboring agents as inputs. Second, we introduce the mean-field approximation into our method to reduce the scale of agent interactions. To more effectively coordinate behaviors of neighboring agents, we enhance the mean-field approximation by a supervised policy rectification network (PRN) for rectifying real-time agent interactions and by a learnable compensation term for correcting the approximation bias. The proposed method enables efficient coordination as well as outperforms several baseline approaches on the adaptive traffic signal control (ATSC) task and the StarCraft II multi-agent challenge (SMAC).
由于部分可观测性以及各智能体之间缺乏准确的实时交互,多智能体设置仍是强化学习(RL)领域的一个基本挑战。在本文中,我们提出了一种基于局部通信学习的新方法,以应对大量智能体共存情况下的多智能体强化学习(MARL)挑战。首先,我们设计了一种新的通信协议,该协议利用深度卷积的能力来有效提取局部关系并学习相邻智能体之间的局部通信。为了促进多智能体协调,我们通过将相邻智能体的策略作为输入,明确学习联合行动的效果。其次,我们将平均场近似引入我们的方法,以减少智能体交互的规模。为了更有效地协调相邻智能体的行为,我们通过用于校正实时智能体交互的监督策略校正网络(PRN)和用于校正近似偏差的可学习补偿项来增强平均场近似。所提出的方法实现了高效协调,并且在自适应交通信号控制(ATSC)任务和星际争霸II多智能体挑战(SMAC)上优于几种基线方法。