Wei Qinglai, Li Yugu, Zhang Jie, Wang Fei-Yue
IEEE Trans Neural Netw Learn Syst. 2024 Jan;35(1):182-195. doi: 10.1109/TNNLS.2022.3172572. Epub 2024 Jan 4.
Although value decomposition networks and the follow on value-based studies factorizes the joint reward function to individual reward functions for a kind of cooperative multiagent reinforcement problem, in which each agent has its local observation and shares a joint reward signal, most of the previous efforts, however, ignored the graphical information between agents. In this article, a new value decomposition with graph attention network (VGN) method is developed to solve the value functions by introducing the dynamical relationships between agents. It is pointed out that the decomposition factor of an agent in our approach can be influenced by the reward signals of all the related agents and two graphical neural network-based algorithms (VGN-Linear and VGN-Nonlinear) are designed to solve the value functions of each agent. It can be proved theoretically that the present methods satisfy the factorizable condition in the centralized training process. The performance of the present methods is evaluated on the StarCraft Multiagent Challenge (SMAC) benchmark. Experiment results show that our method outperforms the state-of-the-art value-based multiagent reinforcement algorithms, especially when the tasks are with very hard level and challenging for existing methods.
尽管值分解网络以及后续基于值的研究针对一类合作多智能体强化问题,将联合奖励函数分解为个体奖励函数,其中每个智能体都有其局部观测并共享一个联合奖励信号,然而,先前的大多数工作都忽略了智能体之间的图形信息。在本文中,开发了一种带有图注意力网络(VGN)的新值分解方法,通过引入智能体之间的动态关系来求解值函数。指出在我们的方法中,一个智能体的分解因子可以受到所有相关智能体的奖励信号的影响,并且设计了两种基于图形神经网络的算法(VGN - 线性和VGN - 非线性)来求解每个智能体的值函数。从理论上可以证明,当前方法在集中训练过程中满足可分解条件。在星际争霸多智能体挑战赛(SMAC)基准上评估了当前方法的性能。实验结果表明,我们的方法优于当前基于值的多智能体强化算法,特别是当任务难度非常大且对现有方法具有挑战性时。