Xu Tao, Meng Zhiwei, Lu Weike, Tong Zhongwen
National Key Laboratory of Automotive Chassis Integration and Bionics, Jilin University, Changchun 130015, China.
School of Rail Transportation, Soochow University, Suzhou 215031, China.
Sensors (Basel). 2024 Jul 31;24(15):4962. doi: 10.3390/s24154962.
The ability to make informed decisions in complex scenarios is crucial for intelligent automotive systems. Traditional expert rules and other methods often fall short in complex contexts. Recently, reinforcement learning has garnered significant attention due to its superior decision-making capabilities. However, there exists the phenomenon of inaccurate target network estimation, which limits its decision-making ability in complex scenarios. This paper mainly focuses on the study of the underestimation phenomenon, and proposes an end-to-end autonomous driving decision-making method based on an improved TD3 algorithm. This method employs a forward camera to capture data. By introducing a new critic network to form a triple-critic structure and combining it with the target maximization operation, the underestimation problem in the TD3 algorithm is solved. Subsequently, the multi-timestep averaging method is used to address the policy instability caused by the new single critic. In addition, this paper uses Carla platform to construct multi-vehicle unprotected left turn and congested lane-center driving scenarios and verifies the algorithm. The results demonstrate that our method surpasses baseline DDPG and TD3 algorithms in aspects such as convergence speed, estimation accuracy, and policy stability.
在复杂场景中做出明智决策的能力对于智能汽车系统至关重要。传统的专家规则和其他方法在复杂环境中往往存在不足。近年来,强化学习因其卓越的决策能力而备受关注。然而,存在目标网络估计不准确的现象,这限制了其在复杂场景中的决策能力。本文主要聚焦于对低估现象的研究,并提出一种基于改进TD3算法的端到端自动驾驶决策方法。该方法利用前向摄像头采集数据。通过引入新的评论家网络形成三评论家结构,并将其与目标最大化操作相结合,解决了TD3算法中的低估问题。随后,采用多时间步平均方法来解决新的单评论家导致的策略不稳定性。此外,本文使用Carla平台构建多车辆无保护左转和拥堵车道中心行驶场景并对算法进行验证。结果表明,我们的方法在收敛速度、估计精度和策略稳定性等方面优于基线DDPG和TD3算法。