Xu Xing, Li Rongpeng, Zhao Zhifeng, Zhang Honggang
IEEE Trans Neural Netw Learn Syst. 2022 Sep;33(9):4285-4299. doi: 10.1109/TNNLS.2021.3056418. Epub 2022 Aug 31.
With the rapid evolution of wireless mobile devices, there emerges an increased need to design effective collaboration mechanisms between intelligent agents to gradually approach the final collective objective by continuously learning from the environment based on their individual observations. In this regard, independent reinforcement learning (IRL) is often deployed in multiagent collaboration to alleviate the problem of a nonstationary learning environment. However, behavioral strategies of intelligent agents in IRL can be formulated only upon their local individual observations of the global environment, and appropriate communication mechanisms must be introduced to reduce their behavioral localities. In this article, we address the problem of communication between intelligent agents in IRL by jointly adopting mechanisms with two different scales. For the large scale, we introduce the stigmergy mechanism as an indirect communication bridge between independent learning agents, and carefully design a mathematical method to indicate the impact of digital pheromone. For the small scale, we propose a conflict-avoidance mechanism between adjacent agents by implementing an additionally embedded neural network to provide more opportunities for participants with higher action priorities. In addition, we present a federal training method to effectively optimize the neural network of each agent in a decentralized manner. Finally, we establish a simulation scenario in which a number of mobile agents in a certain area move automatically to form a specified target shape. Extensive simulations demonstrate the effectiveness of our proposed method.
随着无线移动设备的快速发展,越来越需要设计智能体之间有效的协作机制,以便基于个体观察不断从环境中学习,逐步接近最终的集体目标。在这方面,独立强化学习(IRL)常用于多智能体协作,以缓解非平稳学习环境的问题。然而,IRL中智能体的行为策略只能基于它们对全局环境的局部个体观察来制定,因此必须引入适当的通信机制来减少其行为的局部性。在本文中,我们通过联合采用两种不同尺度的机制来解决IRL中智能体之间的通信问题。对于大尺度,我们引入stigmergy机制作为独立学习智能体之间的间接通信桥梁,并精心设计一种数学方法来表明数字信息素的影响。对于小尺度,我们通过实现一个额外嵌入的神经网络来提出相邻智能体之间的冲突避免机制,为具有更高行动优先级的参与者提供更多机会。此外,我们提出一种联邦训练方法,以有效地以分散方式优化每个智能体的神经网络。最后,我们建立一个模拟场景,其中一定区域内的多个移动智能体自动移动以形成指定的目标形状。大量模拟证明了我们提出的方法的有效性。