SMIX(λ)：增强用于协作多智能体强化学习的集中式价值函数

SMIX(λ): Enhancing Centralized Value Functions for Cooperative Multiagent Reinforcement Learning.

作者信息

Yao Xinghu, Wen Chao, Wang Yuhui, Tan Xiaoyang

出版信息

IEEE Trans Neural Netw Learn Syst. 2023 Jan;34(1):52-63. doi: 10.1109/TNNLS.2021.3089493. Epub 2023 Jan 5.

DOI:10.1109/TNNLS.2021.3089493

Abstract

Learning a stable and generalizable centralized value function (CVF) is a crucial but challenging task in multiagent reinforcement learning (MARL), as it has to deal with the issue that the joint action space increases exponentially with the number of agents in such scenarios. This article proposes an approach, named SMIX( λ ), that uses an OFF-policy training to achieve this by avoiding the greedy assumption commonly made in CVF learning. As importance sampling for such OFF-policy training is both computationally costly and numerically unstable, we proposed to use the λ -return as a proxy to compute the temporal difference (TD) error. With this new loss function objective, we adopt a modified QMIX network structure as the base to train our model. By further connecting it with the Q(λ) approach from a unified expectation correction viewpoint, we show that the proposed SMIX( λ ) is equivalent to Q(λ) and hence shares its convergence properties, while without being suffered from the aforementioned curse of dimensionality problem inherent in MARL. Experiments on the StarCraft Multiagent Challenge (SMAC) benchmark demonstrate that our approach not only outperforms several state-of-the-art MARL methods by a large margin but also can be used as a general tool to improve the overall performance of other centralized training with decentralized execution (CTDE)-type algorithms by enhancing their CVFs.

摘要

在多智能体强化学习（MARL）中，学习一个稳定且可泛化的集中式价值函数（CVF）是一项关键但具有挑战性的任务，因为在这种情况下，联合动作空间会随着智能体数量呈指数级增长。本文提出了一种名为SMIX(λ)的方法，该方法使用离策略训练来实现这一目标，通过避免CVF学习中通常所做的贪婪假设。由于这种离策略训练的重要性采样在计算上成本高昂且数值不稳定，我们建议使用λ回报作为代理来计算时间差分（TD）误差。基于这个新的损失函数目标，我们采用一种改进的QMIX网络结构作为基础来训练我们的模型。从统一的期望校正观点将其与Q(λ)方法进一步联系起来，我们表明所提出的SMIX(λ)等同于Q(λ)，因此具有其收敛特性，同时不会受到MARL中固有的上述维度灾难问题的影响。在星际争霸多智能体挑战赛（SMAC）基准测试上的实验表明，我们的方法不仅大幅优于几种当前最先进的MARL方法，而且还可以作为一种通用工具，通过增强其CVF来提高其他集中训练与分散执行（CTDE）类型算法的整体性能。

相似文献

SMIX(λ): Enhancing Centralized Value Functions for Cooperative Multiagent Reinforcement Learning.SMIX(λ)：增强用于协作多智能体强化学习的集中式价值函数

IEEE Trans Neural Netw Learn Syst. 2023 Jan;34(1):52-63. doi: 10.1109/TNNLS.2021.3089493. Epub 2023 Jan 5.

UNMAS: Multiagent Reinforcement Learning for Unshaped Cooperative Scenarios.联合国排雷行动处：非成形合作场景下的多智能体强化学习

IEEE Trans Neural Netw Learn Syst. 2023 Apr;34(4):2093-2104. doi: 10.1109/TNNLS.2021.3105869. Epub 2023 Apr 4.

SATF: A Scalable Attentive Transfer Framework for Efficient Multiagent Reinforcement Learning.SATF：一种用于高效多智能体强化学习的可扩展注意力转移框架

IEEE Trans Neural Netw Learn Syst. 2025 Apr;36(4):6627-6641. doi: 10.1109/TNNLS.2024.3387397. Epub 2025 Apr 4.

Residual Q-Networks for Value Function Factorizing in Multiagent Reinforcement Learning.用于多智能体强化学习中价值函数分解的残差Q网络

IEEE Trans Neural Netw Learn Syst. 2024 Feb;35(2):1534-1544. doi: 10.1109/TNNLS.2022.3183865. Epub 2024 Feb 5.

Strangeness-driven exploration in multi-agent reinforcement learning.多智能体强化学习中的奇异驱动探索。

Neural Netw. 2024 Apr;172:106149. doi: 10.1016/j.neunet.2024.106149. Epub 2024 Jan 26.

TVDO: Tchebycheff Value-Decomposition Optimization for Multiagent Reinforcement Learning.TVDO：用于多智能体强化学习的切比雪夫值分解优化

IEEE Trans Neural Netw Learn Syst. 2025 Jul;36(7):12521-12534. doi: 10.1109/TNNLS.2024.3455422.

Credit assignment with predictive contribution measurement in multi-agent reinforcement learning.多智能体强化学习中的信用分配与预测贡献度量。

Neural Netw. 2023 Jul;164:681-690. doi: 10.1016/j.neunet.2023.05.021. Epub 2023 May 20.

A Distributional Perspective on Multiagent Cooperation With Deep Reinforcement Learning.基于深度强化学习的多智能体合作的分布视角

IEEE Trans Neural Netw Learn Syst. 2024 Mar;35(3):4246-4259. doi: 10.1109/TNNLS.2022.3202097. Epub 2024 Feb 29.

VGN: Value Decomposition With Graph Attention Networks for Multiagent Reinforcement Learning.VGN：用于多智能体强化学习的基于图注意力网络的价值分解

IEEE Trans Neural Netw Learn Syst. 2024 Jan;35(1):182-195. doi: 10.1109/TNNLS.2022.3172572. Epub 2024 Jan 4.

Multiagent Continual Coordination via Progressive Task Contextualization.通过渐进式任务情境化实现多智能体持续协调

IEEE Trans Neural Netw Learn Syst. 2025 Apr;36(4):6326-6340. doi: 10.1109/TNNLS.2024.3394513. Epub 2025 Apr 4.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。