Kim Taeyoung, Vecchietti Luiz Felipe, Choi Kyujin, Sariel Sanem, Har Dongsoo
Cho Chun Shik Graduate School of Green Transportation, Korea Advanced Institute of Science and Technology, Daejeon, South Korea.
Department of Computer Engineering, Istanbul Technical University, Istanbul, Turkey.
PeerJ Comput Sci. 2021 Sep 17;7:e718. doi: 10.7717/peerj-cs.718. eCollection 2021.
In multi-agent reinforcement learning, the cooperative learning behavior of agents is very important. In the field of heterogeneous multi-agent reinforcement learning, cooperative behavior among different types of agents in a group is pursued. Learning a joint-action set during centralized training is an attractive way to obtain such cooperative behavior; however, this method brings limited learning performance with heterogeneous agents. To improve the learning performance of heterogeneous agents during centralized training, two-stage heterogeneous centralized training which allows the training of multiple roles of heterogeneous agents is proposed. During training, two training processes are conducted in a series. One of the two stages is to attempt training each agent according to its role, aiming at the maximization of individual role rewards. The other is for training the agents as a whole to make them learn cooperative behaviors while attempting to maximize shared collective rewards, , team rewards. Because these two training processes are conducted in a series in every time step, agents can learn how to maximize role rewards and team rewards simultaneously. The proposed method is applied to 5 versus 5 AI robot soccer for validation. The experiments are performed in a robot soccer environment using Webots robot simulation software. Simulation results show that the proposed method can train the robots of the robot soccer team effectively, achieving higher role rewards and higher team rewards as compared to other three approaches that can be used to solve problems of training cooperative multi-agent. Quantitatively, a team trained by the proposed method improves the score concede rate by 5% to 30% when compared to teams trained with the other approaches in matches against evaluation teams.
在多智能体强化学习中,智能体的合作学习行为非常重要。在异构多智能体强化学习领域,追求一组中不同类型智能体之间的合作行为。在集中训练期间学习联合动作集是获得这种合作行为的一种有吸引力的方法;然而,这种方法在异构智能体情况下带来的学习性能有限。为了提高集中训练期间异构智能体的学习性能,提出了两阶段异构集中训练方法,该方法允许对异构智能体的多个角色进行训练。在训练过程中,进行两个连续的训练过程。两个阶段中的一个阶段是根据每个智能体的角色尝试对其进行训练,并使个体角色奖励最大化。另一个阶段是将智能体作为一个整体进行训练,使它们学习合作行为,同时尝试使共享的集体奖励(即团队奖励)最大化。由于这两个训练过程在每个时间步都是连续进行的,因此智能体可以学习如何同时最大化角色奖励和团队奖励。所提出的方法应用于5对5的人工智能机器人足球比赛进行验证。实验在使用Webots机器人仿真软件的机器人足球环境中进行。仿真结果表明,与可用于解决合作多智能体训练问题的其他三种方法相比,所提出的方法可以有效地训练机器人足球队,获得更高的角色奖励和更高的团队奖励。从数量上看,与使用其他方法训练的团队在与评估团队的比赛中相比,使用所提出的方法训练的团队将失球率提高了5%至30%。