Fu Mingsheng, Huang Liwei, Li Fan, Qu Hong, Xu Chengzhong
School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu, 611731, Sichuan, China.
Section of Epidemiology and Population Health, Department of Obstetrics and Gynecology, West China Second University Hospital, Sichuan University, Chengdu, 610041, Sichuan, China.
Neural Netw. 2025 Apr;184:107035. doi: 10.1016/j.neunet.2024.107035. Epub 2024 Dec 14.
Distributional Reinforcement Learning (RL) extends beyond estimating the expected value of future returns by modeling its entire distribution, offering greater expressiveness and capturing deeper insights of the value function. To leverage this advantage, distributional multi-agent systems based on value-decomposition techniques were proposed recently. Ideally, a distributional multi-agent system should be fully distributional, which means both the individual and global value functions should be constructed in distributional forms. However, recent studies show that directly applying traditional value-decomposition techniques to this fully distributional form cannot guarantee the satisfaction of the necessary individual-global-max (IGM) principle. To address this problem, we propose a novel fully value distributional multi-agent framework based on value-decomposition and prove that the IGM principle can be guaranteed under our framework. Based on this framework, a practical deep reinforcement learning model called Fully Distributional Multi-Agent Cooperation (FDMAC) is proposed, and the effectiveness of FDMAC is verified under different scenarios of the StarCraft Multi-Agent Challenge micromanagement environment. Further experimental results show that our FDMAC model can outperform the best baseline by 10.47% on average in terms of the median test win rate.
分布强化学习(RL)通过对未来回报的整个分布进行建模,超越了估计未来回报期望值的范畴,具有更强的表达能力,并能更深入地洞察价值函数。为利用这一优势,近期提出了基于价值分解技术的分布式多智能体系统。理想情况下,分布式多智能体系统应是完全分布式的,这意味着个体价值函数和全局价值函数都应以分布式形式构建。然而,近期研究表明,将传统价值分解技术直接应用于这种完全分布式形式无法保证满足必要的个体 - 全局 - 最大值(IGM)原则。为解决这一问题,我们提出了一种基于价值分解的新型完全价值分布式多智能体框架,并证明在我们的框架下IGM原则能够得到保证。基于此框架,提出了一种名为完全分布式多智能体协作(FDMAC)的实用深度强化学习模型,并在星际争霸多智能体挑战赛微观管理环境的不同场景下验证了FDMAC的有效性。进一步的实验结果表明,我们的FDMAC模型在中位数测试胜率方面平均比最佳基线高出10.47%。