Liu Xiangyu, Tan Ying
IEEE Trans Neural Netw Learn Syst. 2023 Oct;34(10):7775-7783. doi: 10.1109/TNNLS.2022.3146201. Epub 2023 Oct 5.
In this article, we investigate how multiple agents learn to coordinate to form efficient exploration in reinforcement learning. Though straightforward, independent exploration of the joint action space of multiple agents will become exponentially more difficult as the number of agents increases. To tackle this problem, we propose feudal latent-space exploration (FLE) for multi-agent reinforcement learning (MARL). FLE introduces a feudal commander to learn a low-dimensional global latent structure that instructs multiple agents to explore coordinately. Under this framework, the multi-agent policy gradient (PG) is adopted to optimize both the agent policy and latent structure end-to-end. We demonstrate the effectiveness of this method in two multi-agent environments that need explicit coordination. Experimental results validate that FLE outperforms baseline MARL approaches that use independent exploration strategy in terms of mean rewards, efficiency, and the expressiveness of coordination policies.
在本文中,我们研究了多个智能体如何在强化学习中学习协调以形成高效探索。尽管直接对多个智能体的联合动作空间进行独立探索随着智能体数量的增加会变得指数级地更困难。为了解决这个问题,我们提出了用于多智能体强化学习(MARL)的封建潜空间探索(FLE)。FLE引入了一个封建指挥官来学习一个低维全局潜结构,该结构指导多个智能体进行协调探索。在此框架下,采用多智能体策略梯度(PG)来端到端地优化智能体策略和潜结构。我们在两个需要明确协调的多智能体环境中展示了该方法的有效性。实验结果验证了FLE在平均奖励、效率和协调策略的表现力方面优于使用独立探索策略的基线MARL方法。