Li Chao, Dong Shaokang, Yang Shangdong, Hu Yujing, Li Wenbin, Gao Yang
State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, 210023, China.
School of Computer Science, Nanjing University of Posts and Telecommunications, Nanjing, 210023, China.
Neural Netw. 2025 Feb;182:106858. doi: 10.1016/j.neunet.2024.106858. Epub 2024 Nov 12.
Many real-world multi-agent tasks exhibit a nearly decomposable structure, where interactions among agents within the same interaction set are strong while interactions between different sets are relatively weak. Efficiently modeling the nearly decomposable structure and leveraging it to coordinate agents can enhance the learning efficiency of multi-agent reinforcement learning algorithms for cooperative tasks, while existing works typically fail. To overcome this limitation, this paper proposes a novel algorithm named Dual Collaborative Constraints (DCC) that identifies the interaction sets as subtasks and achieves both intra-subtask and inter-subtask coordination. Specifically, DCC employs a bi-level structure to periodically distribute agents into multiple subtasks, and proposes both local and global collaborative constraints based on mutual information to facilitate both intra-subtask and inter-subtask coordination among agents. These two constraints ensure that agents within the same subtask reach a consensus on their local action selections and all of them select superior joint actions that maximize the overall task performance. Experimentally, we evaluate DCC on various cooperative multi-agent tasks, and its superior performance against multiple state-of-the-art baselines demonstrates its effectiveness.
许多现实世界中的多智能体任务呈现出近乎可分解的结构,即同一交互集内的智能体之间的交互很强,而不同集之间的交互相对较弱。有效地对近乎可分解的结构进行建模并利用它来协调智能体,可以提高用于合作任务的多智能体强化学习算法的学习效率,而现有工作通常无法做到这一点。为了克服这一限制,本文提出了一种名为双重协作约束(DCC)的新算法,该算法将交互集识别为子任务,并实现子任务内和子任务间的协调。具体来说,DCC采用双层结构将智能体定期分配到多个子任务中,并基于互信息提出局部和全局协作约束,以促进智能体之间的子任务内和子任务间协调。这两个约束确保同一子任务内的智能体在其局部动作选择上达成共识,并且它们都选择能使整体任务性能最大化的最优联合动作。通过实验,我们在各种合作多智能体任务上评估了DCC,其相对于多个最先进基线的优越性能证明了其有效性。