Yao Jialing, Ge Zhen
School of Automotive and Transportation Engineering, Nanjing Forestry University, Nanjing 210037, China.
Sensors (Basel). 2022 Oct 17;22(20):7881. doi: 10.3390/s22207881.
This paper proposes a deep reinforcement learning (DRL)-based algorithm in the path-tracking controller of an unmanned vehicle to autonomously learn the path-tracking capability of the vehicle by interacting with the CARLA environment. To solve the problem of the high estimation of the Q-value of the DDPG algorithm and slow training speed, the controller adopts the deep deterministic policy gradient algorithm of the double critic network (DCN-DDPG), obtains the trained model through offline learning, and sends control commands to the unmanned vehicle to make the vehicle drive according to the determined route. This method aimed to address the problem of unmanned-vehicle path tracking. This paper proposes a Markov decision process model, including the design of state, action-and-reward value functions, and trained the control strategy in the CARLA simulator Town04 urban scene. The tracking task was completed under various working conditions, and its tracking effect was compared with the original DDPG algorithm, model predictive control (MPC), and pure pursuit. It was verified that the designed control strategy has good environmental adaptability, speed adaptability, and tracking performance.
本文提出了一种基于深度强化学习(DRL)的算法,用于无人车辆的路径跟踪控制器,通过与CARLA环境交互来自主学习车辆的路径跟踪能力。为了解决深度确定性策略梯度算法(DDPG)中Q值估计过高和训练速度慢的问题,该控制器采用了双评论家网络的深度确定性策略梯度算法(DCN-DDPG),通过离线学习获得训练模型,并向无人车辆发送控制命令,使车辆按照确定的路线行驶。该方法旨在解决无人车辆路径跟踪问题。本文提出了一个马尔可夫决策过程模型,包括状态、动作和奖励值函数的设计,并在CARLA模拟器Town04城市场景中训练控制策略。在各种工况下完成了跟踪任务,并将其跟踪效果与原始DDPG算法、模型预测控制(MPC)和纯追踪进行了比较。验证了所设计的控制策略具有良好的环境适应性、速度适应性和跟踪性能。