College of Systems Engineering, National University of Defense Technology, Changsha 410073, China.
Sensors (Basel). 2019 Sep 5;19(18):3837. doi: 10.3390/s19183837.
In this paper, we propose a novel Deep Reinforcement Learning (DRL) algorithm which can navigate non-holonomic robots with continuous control in an unknown dynamic environment with moving obstacles. We call the approach MK-A3C (Memory and Knowledge-based Asynchronous Advantage Actor-Critic) for short. As its first component, MK-A3C builds a GRU-based memory neural network to enhance the robot's capability for temporal reasoning. Robots without it tend to suffer from a lack of rationality in face of incomplete and noisy estimations for complex environments. Additionally, robots with certain memory ability endowed by MK-A3C can avoid local minima traps by estimating the environmental model. Secondly, MK-A3C combines the domain knowledge-based reward function and the transfer learning-based training task architecture, which can solve the non-convergence policies problems caused by sparse reward. These improvements of MK-A3C can efficiently navigate robots in unknown dynamic environments, and satisfy kinetic constraints while handling moving objects. Simulation experiments show that compared with existing methods, MK-A3C can realize successful robotic navigation in unknown and challenging environments by outputting continuous acceleration commands.
在本文中,我们提出了一种新颖的深度强化学习(DRL)算法,该算法可以在具有移动障碍物的未知动态环境中对非完整约束机器人进行连续控制导航。我们将该方法简称为 MK-A3C(基于记忆和知识的异步优势演员批评家)。作为其第一个组成部分,MK-A3C 构建了一个基于 GRU 的记忆神经网络,以增强机器人对时间推理的能力。没有它的机器人在面对复杂环境中不完整和嘈杂的估计时往往会缺乏合理性。此外,MK-A3C 赋予的具有一定记忆能力的机器人可以通过估计环境模型来避免局部极小值陷阱。其次,MK-A3C 将基于领域知识的奖励函数和基于迁移学习的训练任务架构相结合,可以解决稀疏奖励导致的策略不收敛问题。这些改进的 MK-A3C 可以有效地在未知动态环境中导航机器人,同时满足动力学约束条件并处理移动目标。仿真实验表明,与现有方法相比,MK-A3C 可以通过输出连续的加速度命令,在未知和具有挑战性的环境中成功实现机器人导航。