基于深度确定性策略梯度的稀疏奖励环境下移动机器人自主驾驶。

Deep Deterministic Policy Gradient-Based Autonomous Driving for Mobile Robots in Sparse Reward Environments.

机构信息

Department of Electronic Engineering, Yeungnam University, Gyeongsan 38541, Republic of Korea.

Department of Electronic Engineering, Soonchunhyang University, Asan 31538, Republic of Korea.

出版信息

Sensors (Basel). 2022 Dec 7;22(24):9574. doi: 10.3390/s22249574.

DOI:10.3390/s22249574

PMID:36559941

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9787388/

Abstract

In this paper, we propose a deep deterministic policy gradient (DDPG)-based path-planning method for mobile robots by applying the hindsight experience replay (HER) technique to overcome the performance degradation resulting from sparse reward problems occurring in autonomous driving mobile robots. The mobile robot in our analysis was a robot operating system-based TurtleBot3, and the experimental environment was a virtual simulation based on Gazebo. A fully connected neural network was used as the DDPG network based on the actor-critic architecture. Noise was added to the actor network. The robot recognized an unknown environment by measuring distances using a laser sensor and determined the optimized policy to reach its destination. The HER technique improved the learning performance by generating three new episodes with normal experience from a failed episode. The proposed method demonstrated that the HER technique could help mitigate the sparse reward problem; this was further corroborated by the successful autonomous driving results obtained after applying the proposed method to two reward systems, as well as actual experimental results.

摘要

在本文中，我们通过应用事后经验回放（HER）技术来提出一种基于深度确定性策略梯度（DDPG）的移动机器人路径规划方法，以克服自动驾驶移动机器人中出现的稀疏奖励问题导致的性能下降。我们分析中的移动机器人是一个基于机器人操作系统的 TurtleBot3，实验环境是基于 Gazebo 的虚拟仿真。全连接神经网络被用作基于行为者-评论家架构的 DDPG 网络。在行为者网络中添加了噪声。机器人通过使用激光传感器测量距离来识别未知环境，并确定到达目的地的最优策略。HER 技术通过从失败的情节中生成三个具有正常经验的新情节来提高学习性能。所提出的方法证明了 HER 技术可以帮助缓解稀疏奖励问题；通过将所提出的方法应用于两种奖励系统，并获得实际实验结果，进一步证实了这一点。