Xiao Qian, Pan Tengteng, Wang Kexin, Cui Shuoming
School of Intelligent Science Information Engineering, Shenyang University, Shenyang 110044, China.
Sensors (Basel). 2025 Jul 29;25(15):4685. doi: 10.3390/s25154685.
Traditional deep reinforcement learning methods suffer from slow convergence speeds and poor adaptability in complex environments and are prone to falling into local optima in AGV system applications. To address these issues, in this paper, an adaptive path planning algorithm with an improved Deep Q Network algorithm called the B-PER DQN algorithm is proposed. Firstly, a dynamic temperature adjustment mechanism is constructed, and the temperature parameters in the Boltzmann strategy are adaptively adjusted by analyzing the change trend of the recent reward window. Next, the Priority experience replay mechanism is introduced to improve the training efficiency and task diversity through experience grading sampling and random obstacle configuration. Then, a refined multi-objective reward function is designed, combined with direction guidance, step punishment, and end point reward, to effectively guide the agent in learning an efficient path. Our experimental results show that, compared with other algorithms, the improved algorithm proposed in this paper achieves a higher success rate and faster convergence in the same environment and represents an efficient and adaptive solution for reinforcement learning for path planning in complex environments.
传统的深度强化学习方法在复杂环境中存在收敛速度慢和适应性差的问题,并且在AGV系统应用中容易陷入局部最优。为了解决这些问题,本文提出了一种改进的深度Q网络算法——B-PER DQN算法的自适应路径规划算法。首先,构建了一种动态温度调整机制,通过分析近期奖励窗口的变化趋势,对玻尔兹曼策略中的温度参数进行自适应调整。其次,引入了优先经验回放机制,通过经验分级采样和随机障碍物配置来提高训练效率和任务多样性。然后,设计了一种精细化的多目标奖励函数,结合方向引导、步长惩罚和终点奖励,有效地引导智能体学习高效路径。实验结果表明,与其他算法相比,本文提出的改进算法在相同环境下具有更高的成功率和更快的收敛速度,是复杂环境中路径规划强化学习的一种高效且自适应的解决方案。