• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

一种用于多智能体合作的全值分布深度强化学习框架。

A fully value distributional deep reinforcement learning framework for multi-agent cooperation.

作者信息

Fu Mingsheng, Huang Liwei, Li Fan, Qu Hong, Xu Chengzhong

机构信息

School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu, 611731, Sichuan, China.

Section of Epidemiology and Population Health, Department of Obstetrics and Gynecology, West China Second University Hospital, Sichuan University, Chengdu, 610041, Sichuan, China.

出版信息

Neural Netw. 2025 Apr;184:107035. doi: 10.1016/j.neunet.2024.107035. Epub 2024 Dec 14.

DOI:10.1016/j.neunet.2024.107035
PMID:39693677
Abstract

Distributional Reinforcement Learning (RL) extends beyond estimating the expected value of future returns by modeling its entire distribution, offering greater expressiveness and capturing deeper insights of the value function. To leverage this advantage, distributional multi-agent systems based on value-decomposition techniques were proposed recently. Ideally, a distributional multi-agent system should be fully distributional, which means both the individual and global value functions should be constructed in distributional forms. However, recent studies show that directly applying traditional value-decomposition techniques to this fully distributional form cannot guarantee the satisfaction of the necessary individual-global-max (IGM) principle. To address this problem, we propose a novel fully value distributional multi-agent framework based on value-decomposition and prove that the IGM principle can be guaranteed under our framework. Based on this framework, a practical deep reinforcement learning model called Fully Distributional Multi-Agent Cooperation (FDMAC) is proposed, and the effectiveness of FDMAC is verified under different scenarios of the StarCraft Multi-Agent Challenge micromanagement environment. Further experimental results show that our FDMAC model can outperform the best baseline by 10.47% on average in terms of the median test win rate.

摘要

分布强化学习(RL)通过对未来回报的整个分布进行建模,超越了估计未来回报期望值的范畴,具有更强的表达能力,并能更深入地洞察价值函数。为利用这一优势,近期提出了基于价值分解技术的分布式多智能体系统。理想情况下,分布式多智能体系统应是完全分布式的,这意味着个体价值函数和全局价值函数都应以分布式形式构建。然而,近期研究表明,将传统价值分解技术直接应用于这种完全分布式形式无法保证满足必要的个体 - 全局 - 最大值(IGM)原则。为解决这一问题,我们提出了一种基于价值分解的新型完全价值分布式多智能体框架,并证明在我们的框架下IGM原则能够得到保证。基于此框架,提出了一种名为完全分布式多智能体协作(FDMAC)的实用深度强化学习模型,并在星际争霸多智能体挑战赛微观管理环境的不同场景下验证了FDMAC的有效性。进一步的实验结果表明,我们的FDMAC模型在中位数测试胜率方面平均比最佳基线高出10.47%。

相似文献

1
A fully value distributional deep reinforcement learning framework for multi-agent cooperation.一种用于多智能体合作的全值分布深度强化学习框架。
Neural Netw. 2025 Apr;184:107035. doi: 10.1016/j.neunet.2024.107035. Epub 2024 Dec 14.
2
A Distributional Perspective on Multiagent Cooperation With Deep Reinforcement Learning.基于深度强化学习的多智能体合作的分布视角
IEEE Trans Neural Netw Learn Syst. 2024 Mar;35(3):4246-4259. doi: 10.1109/TNNLS.2022.3202097. Epub 2024 Feb 29.
3
MuDE: Multi-agent decomposed reward-based exploration.MuDE:基于多代理分解奖励的探索。
Neural Netw. 2024 Nov;179:106565. doi: 10.1016/j.neunet.2024.106565. Epub 2024 Jul 22.
4
Multi-compartment neuron and population encoding powered spiking neural network for deep distributional reinforcement learning.用于深度分布式强化学习的多隔室神经元与群体编码驱动的脉冲神经网络
Neural Netw. 2025 Feb;182:106898. doi: 10.1016/j.neunet.2024.106898. Epub 2024 Nov 17.
5
Skill matters: Dynamic skill learning for multi-agent cooperative reinforcement learning.技能很重要:多智能体合作强化学习中的动态技能学习
Neural Netw. 2025 Jan;181:106852. doi: 10.1016/j.neunet.2024.106852. Epub 2024 Nov 2.
6
Shared autonomy between human electroencephalography and TD3 deep reinforcement learning: A multi-agent copilot approach.人类脑电图与TD3深度强化学习之间的共享自主性:一种多智能体副驾驶方法。
Ann N Y Acad Sci. 2025 Apr;1546(1):157-172. doi: 10.1111/nyas.15322. Epub 2025 Mar 30.
7
HyperComm: Hypergraph-based communication in multi-agent reinforcement learning.超通讯:多智能体强化学习中的基于超图的通讯。
Neural Netw. 2024 Oct;178:106432. doi: 10.1016/j.neunet.2024.106432. Epub 2024 Jun 10.
8
Hierarchical Attention Master-Slave for heterogeneous multi-agent reinforcement learning.分层注意力主从式异构多智能体强化学习。
Neural Netw. 2023 May;162:359-368. doi: 10.1016/j.neunet.2023.02.037. Epub 2023 Mar 4.
9
Constraining an Unconstrained Multi-agent Policy with offline data.使用离线数据约束无约束多智能体策略。
Neural Netw. 2025 Jun;186:107253. doi: 10.1016/j.neunet.2025.107253. Epub 2025 Feb 13.
10
QTypeMix: Enhancing multi-agent cooperative strategies through heterogeneous and homogeneous value decomposition.QTypeMix:通过异构和同构值分解增强多智能体合作策略
Neural Netw. 2025 Apr;184:107093. doi: 10.1016/j.neunet.2024.107093. Epub 2024 Dec 29.