Suppr超能文献

复杂场景下基于改进TD3算法的端到端自动驾驶决策方法

End-to-End Autonomous Driving Decision Method Based on Improved TD3 Algorithm in Complex Scenarios.

作者信息

Xu Tao, Meng Zhiwei, Lu Weike, Tong Zhongwen

机构信息

National Key Laboratory of Automotive Chassis Integration and Bionics, Jilin University, Changchun 130015, China.

School of Rail Transportation, Soochow University, Suzhou 215031, China.

出版信息

Sensors (Basel). 2024 Jul 31;24(15):4962. doi: 10.3390/s24154962.

Abstract

The ability to make informed decisions in complex scenarios is crucial for intelligent automotive systems. Traditional expert rules and other methods often fall short in complex contexts. Recently, reinforcement learning has garnered significant attention due to its superior decision-making capabilities. However, there exists the phenomenon of inaccurate target network estimation, which limits its decision-making ability in complex scenarios. This paper mainly focuses on the study of the underestimation phenomenon, and proposes an end-to-end autonomous driving decision-making method based on an improved TD3 algorithm. This method employs a forward camera to capture data. By introducing a new critic network to form a triple-critic structure and combining it with the target maximization operation, the underestimation problem in the TD3 algorithm is solved. Subsequently, the multi-timestep averaging method is used to address the policy instability caused by the new single critic. In addition, this paper uses Carla platform to construct multi-vehicle unprotected left turn and congested lane-center driving scenarios and verifies the algorithm. The results demonstrate that our method surpasses baseline DDPG and TD3 algorithms in aspects such as convergence speed, estimation accuracy, and policy stability.

摘要

在复杂场景中做出明智决策的能力对于智能汽车系统至关重要。传统的专家规则和其他方法在复杂环境中往往存在不足。近年来,强化学习因其卓越的决策能力而备受关注。然而,存在目标网络估计不准确的现象,这限制了其在复杂场景中的决策能力。本文主要聚焦于对低估现象的研究,并提出一种基于改进TD3算法的端到端自动驾驶决策方法。该方法利用前向摄像头采集数据。通过引入新的评论家网络形成三评论家结构,并将其与目标最大化操作相结合,解决了TD3算法中的低估问题。随后,采用多时间步平均方法来解决新的单评论家导致的策略不稳定性。此外,本文使用Carla平台构建多车辆无保护左转和拥堵车道中心行驶场景并对算法进行验证。结果表明,我们的方法在收敛速度、估计精度和策略稳定性等方面优于基线DDPG和TD3算法。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eafe/11315049/c8e9b9f1ecf7/sensors-24-04962-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验