Shi Daming, Tong Junbo, Liu Yi, Fan Wenhui
Department of Automation, Tsinghua University, Beijing 100084, China.
Entropy (Basel). 2022 Mar 28;24(4):470. doi: 10.3390/e24040470.
With the development and appliance of multi-agent systems, multi-agent cooperation is becoming an important problem in artificial intelligence. Multi-agent reinforcement learning (MARL) is one of the most effective methods for solving multi-agent cooperative tasks. However, the huge sample complexity of traditional reinforcement learning methods results in two kinds of training waste in MARL for cooperative tasks: all homogeneous agents are trained independently and repetitively, and multi-agent systems need training from scratch when adding a new teammate. To tackle these two problems, we propose the knowledge reuse methods of MARL. On the one hand, this paper proposes sharing experience and policy within agents to mitigate training waste. On the other hand, this paper proposes reusing the policies learned by original teams to avoid knowledge waste when adding a new agent. Experimentally, the Pursuit task demonstrates how sharing experience and policy can accelerate the training speed and enhance the performance simultaneously. Additionally, transferring the learned policies from the N-agent enables the (N+1)-agent team to immediately perform cooperative tasks successfully, and only a minor training resource can allow the multi-agents to reach optimal performance identical to that from scratch.
随着多智能体系统的发展与应用,多智能体合作正成为人工智能中的一个重要问题。多智能体强化学习(MARL)是解决多智能体合作任务最有效的方法之一。然而,传统强化学习方法巨大的样本复杂性在MARL合作任务中导致了两种训练浪费:所有同质智能体被独立且重复地训练,并且当添加新队友时多智能体系统需要从头开始训练。为了解决这两个问题,我们提出了MARL的知识重用方法。一方面,本文提出在智能体之间共享经验和策略以减轻训练浪费。另一方面,本文提出重用原始团队学习到的策略以避免添加新智能体时的知识浪费。实验表明,追踪任务展示了共享经验和策略如何能够同时加速训练速度并提升性能。此外,将从N个智能体学到的策略进行迁移能使(N + 1)智能体团队立即成功执行合作任务,并且只需少量训练资源就能让多智能体达到与从头开始训练相同的最优性能。