Deguale Demelash Abiye, Yu Lingli, Sinishaw Melikamu Liyih, Li Keyi
School of Automation, Central South University, Changsha 410083, China.
School of Computer Science and Engineering, Central South University, Changsha 410083, China.
Sensors (Basel). 2024 Feb 27;24(5):1523. doi: 10.3390/s24051523.
Path planning for mobile robots in complex circumstances is still a challenging issue. This work introduces an improved deep reinforcement learning strategy for robot navigation that combines dueling architecture, Prioritized Experience Replay, and shaped Rewards. In a grid world and two Gazebo simulation environments with static and dynamic obstacles, the Dueling Deep Q-Network with Modified Rewards and Prioritized Experience Replay (PMR-Dueling DQN) algorithm is compared against Q-learning, DQN, and DDQN in terms of path optimality, collision avoidance, and learning speed. To encourage the best routes, the shaped Reward function takes into account target direction, obstacle avoidance, and distance. Prioritized replay concentrates training on important events while a dueling architecture separates value and advantage learning. The results show that the PMR-Dueling DQN has greatly increased convergence speed, stability, and overall performance across conditions. In both grid world and Gazebo environments the PMR-Dueling DQN achieved higher cumulative rewards. The combination of deep reinforcement learning with reward design, network architecture, and experience replay enables the PMR-Dueling DQN to surpass traditional approaches for robot path planning in complex environments.
复杂环境下移动机器人的路径规划仍然是一个具有挑战性的问题。这项工作引入了一种改进的深度强化学习策略用于机器人导航,该策略结合了对决架构、优先经验回放和塑形奖励。在一个网格世界以及两个带有静态和动态障碍物的Gazebo模拟环境中,将带有修正奖励和优先经验回放的对决深度Q网络(PMR-对决DQN)算法与Q学习、DQN和DDQN在路径最优性、避障和学习速度方面进行了比较。为了鼓励最佳路径,塑形奖励函数考虑了目标方向、避障和距离。优先回放将训练集中在重要事件上,而对决架构则将价值学习和优势学习分开。结果表明,PMR-对决DQN在各种条件下大大提高了收敛速度、稳定性和整体性能。在网格世界和Gazebo环境中,PMR-对决DQN都获得了更高的累积奖励。深度强化学习与奖励设计、网络架构和经验回放的结合,使PMR-对决DQN能够超越传统方法,用于复杂环境下的机器人路径规划。