Chen Guangda, Yao Shunyi, Ma Jun, Pan Lifan, Chen Yu'an, Xu Pei, Ji Jianmin, Chen Xiaoping
School of Computer Science and Technology, University of Science and Technology of China, Hefei 230026, China.
School of Data Science, University of Science and Technology of China, Hefei 230026, China.
Sensors (Basel). 2020 Aug 27;20(17):4836. doi: 10.3390/s20174836.
It is challenging to avoid obstacles safely and efficiently for multiple robots of different shapes in distributed and communication-free scenarios, where robots do not communicate with each other and only sense other robots' positions and obstacles around them. Most existing multi-robot collision avoidance systems either require communication between robots or require expensive movement data of other robots, like velocities, accelerations and paths. In this paper, we propose a map-based deep reinforcement learning approach for multi-robot collision avoidance in a distributed and communication-free environment. We use the egocentric local grid map of a robot to represent the environmental information around it including its shape and observable appearances of other robots and obstacles, which can be easily generated by using multiple sensors or sensor fusion. Then we apply the distributed proximal policy optimization (DPPO) algorithm to train a convolutional neural network that directly maps three frames of egocentric local grid maps and the robot's relative local goal positions into low-level robot control commands. Compared to other methods, the map-based approach is more robust to noisy sensor data, does not require robots' movement data and considers sizes and shapes of related robots, which make it to be more efficient and easier to be deployed to real robots. We first train the neural network in a specified simulator of multiple mobile robots using DPPO, where a multi-stage curriculum learning strategy for multiple scenarios is used to improve the performance. Then we deploy the trained model to real robots to perform collision avoidance in their navigation without tedious parameter tuning. We evaluate the approach with multiple scenarios both in the simulator and on four differential-drive mobile robots in the real world. Both qualitative and quantitative experiments show that our approach is efficient and outperforms existing DRL-based approaches in many indicators. We also conduct ablation studies showing the positive effects of using egocentric grid maps and multi-stage curriculum learning.
在分布式且无通信的场景中,对于多个形状各异的机器人而言,要安全、高效地避开障碍物颇具挑战性。在这种场景下,机器人彼此之间不进行通信,仅感知其他机器人的位置以及周围的障碍物。大多数现有的多机器人碰撞避免系统要么需要机器人之间进行通信,要么需要其他机器人昂贵的运动数据,如速度、加速度和路径等。在本文中,我们提出了一种基于地图的深度强化学习方法,用于在分布式且无通信的环境中实现多机器人碰撞避免。我们使用机器人的以自我为中心的局部网格地图来表示其周围的环境信息,包括其自身形状以及其他机器人和障碍物的可观测外观,这可以通过使用多个传感器或传感器融合轻松生成。然后,我们应用分布式近端策略优化(DPPO)算法来训练一个卷积神经网络,该网络直接将三帧以自我为中心的局部网格地图以及机器人的相对局部目标位置映射为低级机器人控制命令。与其他方法相比,基于地图的方法对噪声传感器数据更具鲁棒性,不需要机器人的运动数据,并且考虑了相关机器人的尺寸和形状,这使其更高效且更易于部署到实际机器人上。我们首先在多个移动机器人的指定模拟器中使用DPPO训练神经网络,其中针对多个场景采用多阶段课程学习策略来提高性能。然后,我们将训练好的模型部署到实际机器人上,以便在其导航过程中执行碰撞避免,而无需进行繁琐的参数调整。我们在模拟器以及现实世界中的四个差动驱动移动机器人上用多个场景对该方法进行了评估。定性和定量实验均表明,我们的方法是高效的,并且在许多指标上优于现有的基于深度强化学习的方法。我们还进行了消融研究,展示了使用以自我为中心的网格地图和多阶段课程学习的积极效果。