Li Jinming, Liu Qingshan, Chi Guoyi
School of Mathematics, Southeast University, Nanjing 210096, China.
School of Mathematics, Southeast University, Nanjing 210096, China; Purple Mountain Laboratories, Nanjing 211111, China.
Neural Netw. 2024 Mar;171:61-72. doi: 10.1016/j.neunet.2023.11.063. Epub 2023 Dec 1.
Improving generalization ability in multi-robot formation can reduce repetitive training and calculation. In this paper, we study the multi-robot formation problem with the ability to generalize the target position. Since the generalization ability of neural network is directly proportional to spatial dimension, we adopt the strategy of using different networks to solve different objectives, so that the network learning can focus on the learning of one objective to obtain better performance. In addition, this paper presents a distributed deep reinforcement learning method based on soft actor-critic algorithm for solving multi-robot formation problem. At the same time, the formation evaluation assignment function is designed to adapt to distributed training. Compared with the original algorithm, the improved algorithm can get higher reward cumulative values. The experimental results show that the proposed algorithm can better maintain the desired formation in the moving process, and the rotation design in the reward function makes the multi-robot system have better flexibility in formation. The comparison of control signal curve shows that the proposed algorithm is more stable. At the end of the experiments, the universality of the proposed algorithm in formation maintenance and formation variations is demonstrated.
提高多机器人编队中的泛化能力可以减少重复训练和计算。在本文中,我们研究了具有泛化目标位置能力的多机器人编队问题。由于神经网络的泛化能力与空间维度成正比,我们采用使用不同网络解决不同目标的策略,以便网络学习能够专注于一个目标的学习以获得更好的性能。此外,本文提出了一种基于软演员-评论家算法的分布式深度强化学习方法来解决多机器人编队问题。同时,设计了编队评估分配函数以适应分布式训练。与原始算法相比,改进算法能够获得更高的奖励累积值。实验结果表明,所提出的算法能够在移动过程中更好地保持期望的编队,并且奖励函数中的旋转设计使得多机器人系统在编队方面具有更好的灵活性。控制信号曲线的比较表明所提出的算法更稳定。在实验结束时,证明了所提出算法在编队维持和编队变化方面的通用性。