• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

复杂场景下基于改进TD3算法的端到端自动驾驶决策方法

End-to-End Autonomous Driving Decision Method Based on Improved TD3 Algorithm in Complex Scenarios.

作者信息

Xu Tao, Meng Zhiwei, Lu Weike, Tong Zhongwen

机构信息

National Key Laboratory of Automotive Chassis Integration and Bionics, Jilin University, Changchun 130015, China.

School of Rail Transportation, Soochow University, Suzhou 215031, China.

出版信息

Sensors (Basel). 2024 Jul 31;24(15):4962. doi: 10.3390/s24154962.

DOI:10.3390/s24154962
PMID:39124010
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11315049/
Abstract

The ability to make informed decisions in complex scenarios is crucial for intelligent automotive systems. Traditional expert rules and other methods often fall short in complex contexts. Recently, reinforcement learning has garnered significant attention due to its superior decision-making capabilities. However, there exists the phenomenon of inaccurate target network estimation, which limits its decision-making ability in complex scenarios. This paper mainly focuses on the study of the underestimation phenomenon, and proposes an end-to-end autonomous driving decision-making method based on an improved TD3 algorithm. This method employs a forward camera to capture data. By introducing a new critic network to form a triple-critic structure and combining it with the target maximization operation, the underestimation problem in the TD3 algorithm is solved. Subsequently, the multi-timestep averaging method is used to address the policy instability caused by the new single critic. In addition, this paper uses Carla platform to construct multi-vehicle unprotected left turn and congested lane-center driving scenarios and verifies the algorithm. The results demonstrate that our method surpasses baseline DDPG and TD3 algorithms in aspects such as convergence speed, estimation accuracy, and policy stability.

摘要

在复杂场景中做出明智决策的能力对于智能汽车系统至关重要。传统的专家规则和其他方法在复杂环境中往往存在不足。近年来,强化学习因其卓越的决策能力而备受关注。然而,存在目标网络估计不准确的现象,这限制了其在复杂场景中的决策能力。本文主要聚焦于对低估现象的研究,并提出一种基于改进TD3算法的端到端自动驾驶决策方法。该方法利用前向摄像头采集数据。通过引入新的评论家网络形成三评论家结构,并将其与目标最大化操作相结合,解决了TD3算法中的低估问题。随后,采用多时间步平均方法来解决新的单评论家导致的策略不稳定性。此外,本文使用Carla平台构建多车辆无保护左转和拥堵车道中心行驶场景并对算法进行验证。结果表明,我们的方法在收敛速度、估计精度和策略稳定性等方面优于基线DDPG和TD3算法。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eafe/11315049/1a04a0e224e0/sensors-24-04962-g013.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eafe/11315049/c8e9b9f1ecf7/sensors-24-04962-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eafe/11315049/277d86a874b0/sensors-24-04962-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eafe/11315049/f8f133037901/sensors-24-04962-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eafe/11315049/bcda7c6cab0b/sensors-24-04962-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eafe/11315049/a28744dacb86/sensors-24-04962-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eafe/11315049/f6d01177d082/sensors-24-04962-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eafe/11315049/9a7552abe4fb/sensors-24-04962-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eafe/11315049/0fd2b6eb5419/sensors-24-04962-g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eafe/11315049/9e0808277af7/sensors-24-04962-g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eafe/11315049/137fdc7ffac4/sensors-24-04962-g010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eafe/11315049/d3fbde307913/sensors-24-04962-g011.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eafe/11315049/1d08ef940129/sensors-24-04962-g012.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eafe/11315049/1a04a0e224e0/sensors-24-04962-g013.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eafe/11315049/c8e9b9f1ecf7/sensors-24-04962-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eafe/11315049/277d86a874b0/sensors-24-04962-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eafe/11315049/f8f133037901/sensors-24-04962-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eafe/11315049/bcda7c6cab0b/sensors-24-04962-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eafe/11315049/a28744dacb86/sensors-24-04962-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eafe/11315049/f6d01177d082/sensors-24-04962-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eafe/11315049/9a7552abe4fb/sensors-24-04962-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eafe/11315049/0fd2b6eb5419/sensors-24-04962-g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eafe/11315049/9e0808277af7/sensors-24-04962-g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eafe/11315049/137fdc7ffac4/sensors-24-04962-g010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eafe/11315049/d3fbde307913/sensors-24-04962-g011.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eafe/11315049/1d08ef940129/sensors-24-04962-g012.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eafe/11315049/1a04a0e224e0/sensors-24-04962-g013.jpg

相似文献

1
End-to-End Autonomous Driving Decision Method Based on Improved TD3 Algorithm in Complex Scenarios.复杂场景下基于改进TD3算法的端到端自动驾驶决策方法
Sensors (Basel). 2024 Jul 31;24(15):4962. doi: 10.3390/s24154962.
2
Lane Following Method Based on Improved DDPG Algorithm.基于改进 DDPG 算法的车道跟随方法。
Sensors (Basel). 2021 Jul 15;21(14):4827. doi: 10.3390/s21144827.
3
Towards Robust Decision-Making for Autonomous Highway Driving Based on Safe Reinforcement Learning.基于安全强化学习的稳健自主高速公路驾驶决策方法
Sensors (Basel). 2024 Jun 26;24(13):4140. doi: 10.3390/s24134140.
4
A Multi-Task Fusion Strategy-Based Decision-Making and Planning Method for Autonomous Driving Vehicles.一种基于多任务融合策略的自动驾驶车辆决策与规划方法
Sensors (Basel). 2023 Aug 8;23(16):7021. doi: 10.3390/s23167021.
5
End-to-End Autonomous Navigation Based on Deep Reinforcement Learning with a Survival Penalty Function.基于带有生存惩罚函数的深度强化学习的端到端自主导航
Sensors (Basel). 2023 Oct 23;23(20):8651. doi: 10.3390/s23208651.
6
Coordinated Decision Control of Lane-Change and Car-Following for Intelligent Vehicle Based on Time Series Prediction and Deep Reinforcement Learning.基于时间序列预测和深度强化学习的智能车辆变道与跟车协同决策控制
Sensors (Basel). 2024 Jan 9;24(2):403. doi: 10.3390/s24020403.
7
Reinforcement Learning-Based Autonomous Driving at Intersections in CARLA Simulator.基于强化学习的CARLA模拟器中十字路口自动驾驶
Sensors (Basel). 2022 Nov 1;22(21):8373. doi: 10.3390/s22218373.
8
Deep Reinforcement Learning on Autonomous Driving Policy With Auxiliary Critic Network.基于辅助评论家网络的自动驾驶策略深度强化学习
IEEE Trans Neural Netw Learn Syst. 2023 Jul;34(7):3680-3690. doi: 10.1109/TNNLS.2021.3116063. Epub 2023 Jul 6.
9
Adaptive control for circulating cooling water system using deep reinforcement learning.基于深度强化学习的循环冷却水系统自适应控制。
PLoS One. 2024 Jul 24;19(7):e0307767. doi: 10.1371/journal.pone.0307767. eCollection 2024.
10
Meta attention for Off-Policy Actor-Critic.用于离策略演员-评论家的元注意力机制
Neural Netw. 2023 Jun;163:86-96. doi: 10.1016/j.neunet.2023.03.024. Epub 2023 Mar 28.

本文引用的文献

1
A Survey on Reinforcement Learning for Recommender Systems.推荐系统的强化学习研究综述
IEEE Trans Neural Netw Learn Syst. 2024 Oct;35(10):13164-13184. doi: 10.1109/TNNLS.2023.3280161. Epub 2024 Oct 7.
2
Deep Reinforcement Learning: A Survey.深度强化学习综述
IEEE Trans Neural Netw Learn Syst. 2024 Apr;35(4):5064-5078. doi: 10.1109/TNNLS.2022.3207346. Epub 2024 Apr 4.
3
Prioritized Experience-Based Reinforcement Learning With Human Guidance for Autonomous Driving.基于人类引导的优先经验强化学习在自动驾驶中的应用
IEEE Trans Neural Netw Learn Syst. 2022 Jun 10;PP. doi: 10.1109/TNNLS.2022.3177685.
4
Reducing Estimation Bias via Triplet-Average Deep Deterministic Policy Gradient.通过三元组平均深度确定性策略梯度减少估计偏差
IEEE Trans Neural Netw Learn Syst. 2020 Nov;31(11):4933-4945. doi: 10.1109/TNNLS.2019.2959129. Epub 2020 Oct 30.