Liu Weiguo, Xiang Zhiyu, Fang Han, Huo Ke, Wang Zixu
Information Science & Electronic Engineering, Zhejiang University, Hangzhou 310027, China.
National Innovation Center of Intelligent and Connected Vehicles, Beijing 100176, China.
Sensors (Basel). 2023 Aug 8;23(16):7021. doi: 10.3390/s23167021.
The autonomous driving technology based on deep reinforcement learning (DRL) has been confirmed as one of the most cutting-edge research fields worldwide. The agent is enabled to achieve the goal of making independent decisions by interacting with the environment and learning driving strategies based on the feedback from the environment. This technology has been widely used in end-to-end driving tasks. However, this field faces several challenges. First, developing real vehicles is expensive, time-consuming, and risky. To further expedite the testing, verification, and iteration of end-to-end deep reinforcement learning algorithms, a joint simulation development and validation platform was designed and implemented in this study based on VTD-CarSim and the Tensorflow deep learning framework, and research work was conducted based on this platform. Second, sparse reward signals can cause problems (e.g., a low-sample learning rate). It is imperative for the agent to be capable of navigating in an unfamiliar environment and driving safely under a wide variety of weather or lighting conditions. To address the problem of poor generalization ability of the agent to unknown scenarios, a deep deterministic policy gradient (DDPG) decision-making and planning method was proposed in this study in accordance with a multi-task fusion strategy. The main task based on DRL decision-making planning and the auxiliary task based on image semantic segmentation were cross-fused, and part of the network was shared with the main task to reduce the possibility of model overfitting and improve the generalization ability. As indicated by the experimental results, first, the joint simulation development and validation platform built in this study exhibited prominent versatility. Users were enabled to easily substitute any default module with customized algorithms and verify the effectiveness of new functions in enhancing overall performance using other default modules of the platform. Second, the deep reinforcement learning strategy based on multi-task fusion proposed in this study was competitive. Its performance was better than other DRL algorithms in certain tasks, which improved the generalization ability of the vehicle decision-making planning algorithm.
基于深度强化学习(DRL)的自动驾驶技术已被确认为全球最前沿的研究领域之一。智能体能够通过与环境交互并根据环境反馈学习驾驶策略来实现独立决策的目标。该技术已广泛应用于端到端驾驶任务。然而,这一领域面临着若干挑战。首先,开发真实车辆成本高昂、耗时且有风险。为了进一步加快端到端深度强化学习算法的测试、验证和迭代,本研究基于VTD - CarSim和Tensorflow深度学习框架设计并实现了一个联合仿真开发与验证平台,并基于该平台开展了研究工作。其次,稀疏奖励信号可能会引发问题(例如低样本学习率)。智能体必须能够在不熟悉的环境中导航,并在各种天气或光照条件下安全驾驶。为了解决智能体对未知场景泛化能力差的问题,本研究根据多任务融合策略提出了一种深度确定性策略梯度(DDPG)决策与规划方法。基于DRL决策规划的主要任务和基于图像语义分割的辅助任务进行了交叉融合,并且部分网络与主要任务共享,以降低模型过拟合的可能性并提高泛化能力。实验结果表明,首先,本研究构建的联合仿真开发与验证平台具有显著的通用性。用户能够轻松地用定制算法替换任何默认模块,并使用平台的其他默认模块验证新功能在提升整体性能方面的有效性。其次,本研究提出的基于多任务融合的深度强化学习策略具有竞争力。在某些任务中,其性能优于其他DRL算法,提高了车辆决策规划算法的泛化能力。