Liu Wenxing, Niu Hanlin, Jang Inmo, Herrmann Guido, Carrasco Joaquin
IEEE Trans Neural Netw Learn Syst. 2024 Feb;35(2):2732-2746. doi: 10.1109/TNNLS.2022.3191021. Epub 2024 Feb 5.
In this article, we propose an algorithm that combines actor-critic-based off-policy method with consensus-based distributed training to deal with multiagent deep reinforcement learning problems. Specifically, convergence analysis of a consensus algorithm for a type of nonlinear system with a Lyapunov method is developed, and we use this result to analyze the convergence properties of the actor training parameters and the critic training parameters in our algorithm. Through the convergence analysis, it can be verified that all agents will converge to the same optimal model as the training time goes to infinity. To validate the implementation of our algorithm, a multiagent training framework is proposed to train each Universal Robot 5 (UR5) robot arm to reach the random target position. Finally, experiments are provided to demonstrate the effectiveness and feasibility of the proposed algorithm.
在本文中,我们提出了一种算法,该算法将基于演员-评论家的离策略方法与基于共识的分布式训练相结合,以处理多智能体深度强化学习问题。具体而言,利用李雅普诺夫方法对一类非线性系统的共识算法进行了收敛性分析,并用此结果分析了我们算法中演员训练参数和评论家训练参数的收敛特性。通过收敛性分析,可以验证随着训练时间趋于无穷大,所有智能体将收敛到相同的最优模型。为了验证我们算法的实现,提出了一个多智能体训练框架,用于训练每个通用机器人5(UR5)机器人手臂到达随机目标位置。最后,通过实验证明了所提算法的有效性和可行性。