Lee Ken Ming, Ganapathi Subramanian Sriram, Crowley Mark
Department of Electrical and Computer Engineering, University of Waterloo, Waterloo, ON, Canada.
Front Artif Intell. 2022 Sep 20;5:805823. doi: 10.3389/frai.2022.805823. eCollection 2022.
Independent reinforcement learning algorithms have no theoretical guarantees for finding the best policy in multi-agent settings. However, in practice, prior works have reported good performance with independent algorithms in some domains and bad performance in others. Moreover, a comprehensive study of the strengths and weaknesses of independent algorithms is lacking in the literature. In this paper, we carry out an empirical comparison of the performance of independent algorithms on seven PettingZoo environments that span the three main categories of multi-agent environments, i.e., cooperative, competitive, and mixed. For the cooperative setting, we show that independent algorithms can perform on par with multi-agent algorithms in fully-observable environments, while adding recurrence improves the learning of independent algorithms in partially-observable environments. In the competitive setting, independent algorithms can perform on par or better than multi-agent algorithms, even in more challenging environments. We also show that agents trained independent algorithms learn to perform well individually, but fail to learn to cooperate with allies and compete with enemies in mixed environments.
独立强化学习算法在多智能体环境中寻找最优策略时没有理论保证。然而,在实践中,先前的研究报告称,独立算法在某些领域表现良好,而在其他领域表现不佳。此外,文献中缺乏对独立算法优缺点的全面研究。在本文中,我们对独立算法在七个PettingZoo环境中的性能进行了实证比较,这些环境涵盖了多智能体环境的三个主要类别,即合作型、竞争型和混合型。对于合作环境,我们表明,在完全可观测的环境中,独立算法的性能可以与多智能体算法相媲美,而添加循环则可以提高独立算法在部分可观测环境中的学习能力。在竞争环境中,即使在更具挑战性的环境中,独立算法的性能也可以与多智能体算法相当或更好。我们还表明,使用独立算法训练的智能体能够学会单独表现良好,但在混合环境中无法学会与盟友合作以及与敌人竞争。