• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

SMIX(λ):增强用于协作多智能体强化学习的集中式价值函数

SMIX(λ): Enhancing Centralized Value Functions for Cooperative Multiagent Reinforcement Learning.

作者信息

Yao Xinghu, Wen Chao, Wang Yuhui, Tan Xiaoyang

出版信息

IEEE Trans Neural Netw Learn Syst. 2023 Jan;34(1):52-63. doi: 10.1109/TNNLS.2021.3089493. Epub 2023 Jan 5.

DOI:10.1109/TNNLS.2021.3089493
PMID:34181556
Abstract

Learning a stable and generalizable centralized value function (CVF) is a crucial but challenging task in multiagent reinforcement learning (MARL), as it has to deal with the issue that the joint action space increases exponentially with the number of agents in such scenarios. This article proposes an approach, named SMIX( λ ), that uses an OFF-policy training to achieve this by avoiding the greedy assumption commonly made in CVF learning. As importance sampling for such OFF-policy training is both computationally costly and numerically unstable, we proposed to use the λ -return as a proxy to compute the temporal difference (TD) error. With this new loss function objective, we adopt a modified QMIX network structure as the base to train our model. By further connecting it with the Q(λ) approach from a unified expectation correction viewpoint, we show that the proposed SMIX( λ ) is equivalent to Q(λ) and hence shares its convergence properties, while without being suffered from the aforementioned curse of dimensionality problem inherent in MARL. Experiments on the StarCraft Multiagent Challenge (SMAC) benchmark demonstrate that our approach not only outperforms several state-of-the-art MARL methods by a large margin but also can be used as a general tool to improve the overall performance of other centralized training with decentralized execution (CTDE)-type algorithms by enhancing their CVFs.

摘要

在多智能体强化学习(MARL)中,学习一个稳定且可泛化的集中式价值函数(CVF)是一项关键但具有挑战性的任务,因为在这种情况下,联合动作空间会随着智能体数量呈指数级增长。本文提出了一种名为SMIX(λ)的方法,该方法使用离策略训练来实现这一目标,通过避免CVF学习中通常所做的贪婪假设。由于这种离策略训练的重要性采样在计算上成本高昂且数值不稳定,我们建议使用λ回报作为代理来计算时间差分(TD)误差。基于这个新的损失函数目标,我们采用一种改进的QMIX网络结构作为基础来训练我们的模型。从统一的期望校正观点将其与Q(λ)方法进一步联系起来,我们表明所提出的SMIX(λ)等同于Q(λ),因此具有其收敛特性,同时不会受到MARL中固有的上述维度灾难问题的影响。在星际争霸多智能体挑战赛(SMAC)基准测试上的实验表明,我们的方法不仅大幅优于几种当前最先进的MARL方法,而且还可以作为一种通用工具,通过增强其CVF来提高其他集中训练与分散执行(CTDE)类型算法的整体性能。

相似文献

1
SMIX(λ): Enhancing Centralized Value Functions for Cooperative Multiagent Reinforcement Learning.SMIX(λ):增强用于协作多智能体强化学习的集中式价值函数
IEEE Trans Neural Netw Learn Syst. 2023 Jan;34(1):52-63. doi: 10.1109/TNNLS.2021.3089493. Epub 2023 Jan 5.
2
UNMAS: Multiagent Reinforcement Learning for Unshaped Cooperative Scenarios.联合国排雷行动处:非成形合作场景下的多智能体强化学习
IEEE Trans Neural Netw Learn Syst. 2023 Apr;34(4):2093-2104. doi: 10.1109/TNNLS.2021.3105869. Epub 2023 Apr 4.
3
SATF: A Scalable Attentive Transfer Framework for Efficient Multiagent Reinforcement Learning.SATF:一种用于高效多智能体强化学习的可扩展注意力转移框架
IEEE Trans Neural Netw Learn Syst. 2025 Apr;36(4):6627-6641. doi: 10.1109/TNNLS.2024.3387397. Epub 2025 Apr 4.
4
Residual Q-Networks for Value Function Factorizing in Multiagent Reinforcement Learning.用于多智能体强化学习中价值函数分解的残差Q网络
IEEE Trans Neural Netw Learn Syst. 2024 Feb;35(2):1534-1544. doi: 10.1109/TNNLS.2022.3183865. Epub 2024 Feb 5.
5
Strangeness-driven exploration in multi-agent reinforcement learning.多智能体强化学习中的奇异驱动探索。
Neural Netw. 2024 Apr;172:106149. doi: 10.1016/j.neunet.2024.106149. Epub 2024 Jan 26.
6
TVDO: Tchebycheff Value-Decomposition Optimization for Multiagent Reinforcement Learning.TVDO:用于多智能体强化学习的切比雪夫值分解优化
IEEE Trans Neural Netw Learn Syst. 2025 Jul;36(7):12521-12534. doi: 10.1109/TNNLS.2024.3455422.
7
Credit assignment with predictive contribution measurement in multi-agent reinforcement learning.多智能体强化学习中的信用分配与预测贡献度量。
Neural Netw. 2023 Jul;164:681-690. doi: 10.1016/j.neunet.2023.05.021. Epub 2023 May 20.
8
A Distributional Perspective on Multiagent Cooperation With Deep Reinforcement Learning.基于深度强化学习的多智能体合作的分布视角
IEEE Trans Neural Netw Learn Syst. 2024 Mar;35(3):4246-4259. doi: 10.1109/TNNLS.2022.3202097. Epub 2024 Feb 29.
9
VGN: Value Decomposition With Graph Attention Networks for Multiagent Reinforcement Learning.VGN:用于多智能体强化学习的基于图注意力网络的价值分解
IEEE Trans Neural Netw Learn Syst. 2024 Jan;35(1):182-195. doi: 10.1109/TNNLS.2022.3172572. Epub 2024 Jan 4.
10
Multiagent Continual Coordination via Progressive Task Contextualization.通过渐进式任务情境化实现多智能体持续协调
IEEE Trans Neural Netw Learn Syst. 2025 Apr;36(4):6326-6340. doi: 10.1109/TNNLS.2024.3394513. Epub 2025 Apr 4.