Fu Songchen, Zhao Shaojing, Li Ta, Yan Yonghong
Laboratory of Speech and Intelligent Information Processing, Institute of Acoustics, CAS, Beijing, China; University of Chinese Academy of Sciences, Beijing, China.
Neural Netw. 2025 Apr;184:107093. doi: 10.1016/j.neunet.2024.107093. Epub 2024 Dec 29.
In multi-agent cooperative tasks, the presence of heterogeneous agents is familiar. Compared to cooperation among homogeneous agents, collaboration requires considering the best-suited sub-tasks for each agent. However, the operation of multi-agent systems often involves a large amount of complex interaction information, making it more challenging to learn heterogeneous strategies. Related multi-agent reinforcement learning methods sometimes use grouping mechanisms to form smaller cooperative groups or leverage prior domain knowledge to learn strategies for different roles. In contrast, agents should learn deeper role features without relying on additional information. Therefore, we propose QTypeMix, which divides the value decomposition process into homogeneous and heterogeneous stages. QTypeMix learns to extract type features from local historical observations through the TE loss. In addition, we introduce advanced network structures containing attention mechanisms and hypernets to enhance the representation capability and achieve the value decomposition process. The results of testing the proposed method on 14 maps from SMAC and SMACv2 show that QTypeMix achieves state-of-the-art performance in tasks of varying difficulty.
在多智能体协作任务中,异构智能体的存在很常见。与同构智能体之间的协作相比,异构协作需要考虑每个智能体最适合的子任务。然而,多智能体系统的运行通常涉及大量复杂的交互信息,这使得学习异构策略更具挑战性。相关的多智能体强化学习方法有时会使用分组机制来形成较小的协作组,或者利用先验领域知识来学习不同角色的策略。相比之下,智能体应该在不依赖额外信息的情况下学习更深层次的角色特征。因此,我们提出了QTypeMix,它将值分解过程分为同构和异构阶段。QTypeMix通过TE损失学习从局部历史观测中提取类型特征。此外,我们引入了包含注意力机制和超网络的先进网络结构,以增强表示能力并实现值分解过程。在来自SMAC和SMACv2的14个地图上测试该方法的结果表明,QTypeMix在不同难度的任务中都取得了领先的性能。