Information Science and Electrical Engineering, Shandong Jiao Tong University, Jinan 250357, China.
Institute of Automation, Shandong Academy of Sciences, Jinan 250013, China.
Sensors (Basel). 2022 May 8;22(9):3579. doi: 10.3390/s22093579.
When a traditional Deep Deterministic Policy Gradient (DDPG) algorithm is used in mobile robot path planning, due to the limited observable environment of mobile robots, the training efficiency of the path planning model is low, and the convergence speed is slow. In this paper, Long Short-Term Memory (LSTM) is introduced into the DDPG network, the former and current states of the mobile robot are combined to determine the actions of the robot, and a Batch Norm layer is added after each layer of the Actor network. At the same time, the reward function is optimized to guide the mobile robot to move faster towards the target point. In order to improve the learning efficiency, different normalization methods are used to normalize the distance and angle between the mobile robot and the target point, which are used as the input of the DDPG network model. When the model outputs the next action of the mobile robot, mixed noise composed of Gaussian noise and Ornstein-Uhlenbeck (OU) noise is added. Finally, the simulation environment built by a ROS system and a Gazebo platform is used for experiments. The results show that the proposed algorithm can accelerate the convergence speed of DDPG, improve the generalization ability of the path planning model and improve the efficiency and success rate of mobile robot path planning.
当传统的深度确定性策略梯度(DDPG)算法应用于移动机器人路径规划时,由于移动机器人可观测环境有限,路径规划模型的训练效率低,收敛速度慢。本文将长短时记忆网络(LSTM)引入 DDPG 网络,通过结合移动机器人的前一时刻和当前时刻的状态来确定机器人的动作,并在 Actor 网络的每一层后添加一个 Batch Norm 层。同时,优化奖励函数以引导移动机器人更快地朝着目标点移动。为了提高学习效率,使用不同的归一化方法对移动机器人和目标点之间的距离和角度进行归一化,作为 DDPG 网络模型的输入。当模型输出移动机器人的下一个动作时,添加由高斯噪声和 Ornstein-Uhlenbeck(OU)噪声组成的混合噪声。最后,使用 ROS 系统和 Gazebo 平台构建的仿真环境进行实验。结果表明,所提出的算法可以加快 DDPG 的收敛速度,提高路径规划模型的泛化能力,提高移动机器人路径规划的效率和成功率。