Jeng Shyr-Long, Chiang Chienhsun
Department of Mechanical Engineering, Lunghwa University of Science and Technology, Taoyuan City 333326, Taiwan.
Department of Mechanical Engineering, National Yang Ming Chiao Tung University, Hsinchu City 300093, Taiwan.
Sensors (Basel). 2023 Oct 23;23(20):8651. doi: 10.3390/s23208651.
An end-to-end approach to autonomous navigation that is based on deep reinforcement learning (DRL) with a survival penalty function is proposed in this paper. Two actor-critic (AC) frameworks, namely, deep deterministic policy gradient (DDPG) and twin-delayed DDPG (TD3), are employed to enable a nonholonomic wheeled mobile robot (WMR) to perform navigation in dynamic environments containing obstacles and for which no maps are available. A comprehensive reward based on the survival penalty function is introduced; this approach effectively solves the sparse reward problem and enables the WMR to move toward its target. Consecutive episodes are connected to increase the cumulative penalty for scenarios involving obstacles; this method prevents training failure and enables the WMR to plan a collision-free path. Simulations are conducted for four scenarios-movement in an obstacle-free space, in a parking lot, at an intersection without and with a central obstacle, and in a multiple obstacle space-to demonstrate the efficiency and operational safety of our method. For the same navigation environment, compared with the DDPG algorithm, the TD3 algorithm exhibits faster numerical convergence and higher stability in the training phase, as well as a higher task execution success rate in the evaluation phase.
本文提出了一种基于深度强化学习(DRL)并带有生存惩罚函数的端到端自主导航方法。采用了两种演员-评论家(AC)框架,即深度确定性策略梯度(DDPG)和双延迟DDPG(TD3),以使非完整轮式移动机器人(WMR)能够在包含障碍物且无地图可用的动态环境中执行导航。引入了基于生存惩罚函数的综合奖励;这种方法有效地解决了稀疏奖励问题,并使WMR能够朝着目标移动。连续的情节相互关联,以增加涉及障碍物场景的累积惩罚;该方法可防止训练失败,并使WMR能够规划无碰撞路径。针对四种场景进行了仿真——在无障碍空间、停车场、有无中央障碍物的十字路口以及多障碍物空间中的移动——以证明我们方法的效率和操作安全性。对于相同的导航环境,与DDPG算法相比,TD3算法在训练阶段表现出更快的数值收敛速度和更高的稳定性,以及在评估阶段更高的任务执行成功率。