• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

PORF-DDPG:使用逐步优化的奖励函数学习个性化自主驾驶行为。

PORF-DDPG: Learning Personalized Autonomous Driving Behavior with Progressively Optimized Reward Function.

机构信息

The College of Intelligence Science and Technology, National University of Defense Technology, Changsha 410073, China.

出版信息

Sensors (Basel). 2020 Oct 1;20(19):5626. doi: 10.3390/s20195626.

DOI:10.3390/s20195626
PMID:33019643
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7582292/
Abstract

Autonomous driving with artificial intelligence technology has been viewed as promising for autonomous vehicles hitting the road in the near future. In recent years, considerable progress has been made with Deep Reinforcement Learnings (DRLs) for realizing end-to-end autonomous driving. Still, driving safely and comfortably in real dynamic scenarios with DRL is nontrivial due to the reward functions being typically pre-defined with expertise. This paper proposes a human-in-the-loop DRL algorithm for learning personalized autonomous driving behavior in a progressive learning way. Specifically, a progressively optimized reward function (PORF) learning model is built and integrated into the Deep Deterministic Policy Gradient (DDPG) framework, which is called PORF-DDPG in this paper. PORF consists of two parts: the first part of the PORF is a pre-defined typical reward function on the system state, the second part is modeled as a Deep Neural Network (DNN) for representing driving adjusting intention by the human observer, which is the main contribution of this paper. The DNN-based reward model is progressively learned using the front-view images as the input and via active human supervision and intervention. The proposed approach is potentially useful for driving in dynamic constrained scenarios when dangerous collision events might occur frequently with classic DRLs. The experimental results show that the proposed autonomous driving behavior learning method exhibits online learning capability and environmental adaptability.

摘要

标题:基于深度学习的自动驾驶中基于人在环的个性化驾驶行为学习方法

摘要: 利用人工智能技术实现自动驾驶,被认为是自动驾驶汽车在不久的将来上路的一种有前途的方式。近年来,深度强化学习(DRL)在实现端到端自动驾驶方面取得了相当大的进展。然而,由于奖励函数通常是由专业知识预先定义的,因此在真实动态场景中安全舒适地驾驶仍然具有挑战性。本文提出了一种基于人在环的深度强化学习算法,用于以渐进式学习的方式学习个性化自动驾驶行为。具体来说,构建了一个逐步优化的奖励函数(PORF)学习模型,并将其集成到深度确定性策略梯度(DDPG)框架中,在本文中称为 PORF-DDPG。PORF 由两部分组成:PORF 的第一部分是系统状态的预定义典型奖励函数,第二部分建模为深度神经网络(DNN),用于表示人类观察者的驾驶调整意图,这是本文的主要贡献。基于 DNN 的奖励模型使用前视图图像作为输入,通过主动的人工监督和干预进行逐步学习。当危险碰撞事件可能频繁发生时,所提出的方法对于在动态约束场景中驾驶具有潜在的用途。实验结果表明,所提出的自动驾驶行为学习方法具有在线学习能力和环境适应能力。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bc5f/7582292/0d6f138b3c16/sensors-20-05626-g014.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bc5f/7582292/6b4128ee123c/sensors-20-05626-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bc5f/7582292/820a6fd02c69/sensors-20-05626-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bc5f/7582292/c158311e5161/sensors-20-05626-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bc5f/7582292/b8c89cd7874b/sensors-20-05626-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bc5f/7582292/fbd96b4316b1/sensors-20-05626-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bc5f/7582292/223034032657/sensors-20-05626-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bc5f/7582292/4cf81ec0f646/sensors-20-05626-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bc5f/7582292/339bf11e90da/sensors-20-05626-g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bc5f/7582292/3536f20db1e8/sensors-20-05626-g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bc5f/7582292/8a17d6a4a207/sensors-20-05626-g010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bc5f/7582292/f66921008df8/sensors-20-05626-g011.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bc5f/7582292/e813afd38155/sensors-20-05626-g012.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bc5f/7582292/46a0f6380f36/sensors-20-05626-g013.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bc5f/7582292/0d6f138b3c16/sensors-20-05626-g014.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bc5f/7582292/6b4128ee123c/sensors-20-05626-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bc5f/7582292/820a6fd02c69/sensors-20-05626-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bc5f/7582292/c158311e5161/sensors-20-05626-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bc5f/7582292/b8c89cd7874b/sensors-20-05626-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bc5f/7582292/fbd96b4316b1/sensors-20-05626-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bc5f/7582292/223034032657/sensors-20-05626-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bc5f/7582292/4cf81ec0f646/sensors-20-05626-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bc5f/7582292/339bf11e90da/sensors-20-05626-g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bc5f/7582292/3536f20db1e8/sensors-20-05626-g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bc5f/7582292/8a17d6a4a207/sensors-20-05626-g010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bc5f/7582292/f66921008df8/sensors-20-05626-g011.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bc5f/7582292/e813afd38155/sensors-20-05626-g012.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bc5f/7582292/46a0f6380f36/sensors-20-05626-g013.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bc5f/7582292/0d6f138b3c16/sensors-20-05626-g014.jpg

相似文献

1
PORF-DDPG: Learning Personalized Autonomous Driving Behavior with Progressively Optimized Reward Function.PORF-DDPG:使用逐步优化的奖励函数学习个性化自主驾驶行为。
Sensors (Basel). 2020 Oct 1;20(19):5626. doi: 10.3390/s20195626.
2
Autonomous Driving Control Based on the Technique of Semantic Segmentation.基于语义分割技术的自动驾驶控制。
Sensors (Basel). 2023 Jan 12;23(2):895. doi: 10.3390/s23020895.
3
Optimizing hyperparameters of deep reinforcement learning for autonomous driving based on whale optimization algorithm.基于鲸鱼优化算法优化自动驾驶中深度强化学习的超参数。
PLoS One. 2021 Jun 10;16(6):e0252754. doi: 10.1371/journal.pone.0252754. eCollection 2021.
4
Deep Deterministic Policy Gradient-Based Autonomous Driving for Mobile Robots in Sparse Reward Environments.基于深度确定性策略梯度的稀疏奖励环境下移动机器人自主驾驶。
Sensors (Basel). 2022 Dec 7;22(24):9574. doi: 10.3390/s22249574.
5
Policy-Gradient and Actor-Critic Based State Representation Learning for Safe Driving of Autonomous Vehicles.基于策略梯度和动作价值的自动驾驶车辆安全行驶的状态表示学习。
Sensors (Basel). 2020 Oct 22;20(21):5991. doi: 10.3390/s20215991.
6
Lane Following Method Based on Improved DDPG Algorithm.基于改进 DDPG 算法的车道跟随方法。
Sensors (Basel). 2021 Jul 15;21(14):4827. doi: 10.3390/s21144827.
7
A Multi-Task Fusion Strategy-Based Decision-Making and Planning Method for Autonomous Driving Vehicles.一种基于多任务融合策略的自动驾驶车辆决策与规划方法
Sensors (Basel). 2023 Aug 8;23(16):7021. doi: 10.3390/s23167021.
8
Deep Reinforcement Learning on Autonomous Driving Policy With Auxiliary Critic Network.基于辅助评论家网络的自动驾驶策略深度强化学习
IEEE Trans Neural Netw Learn Syst. 2023 Jul;34(7):3680-3690. doi: 10.1109/TNNLS.2021.3116063. Epub 2023 Jul 6.
9
End-to-End Automated Lane-Change Maneuvering Considering Driving Style Using a Deep Deterministic Policy Gradient Algorithm.基于深度确定性策略梯度算法的考虑驾驶风格的端到端自动变道行驶。
Sensors (Basel). 2020 Sep 22;20(18):5443. doi: 10.3390/s20185443.
10
A Decision-Making Strategy for Car Following Based on Naturalist Driving Data via Deep Reinforcement Learning.基于自然驾驶数据的深度强化学习跟驰决策策略。
Sensors (Basel). 2022 Oct 21;22(20):8055. doi: 10.3390/s22208055.

引用本文的文献

1
Deep deterministic policy gradient algorithm: A systematic review.深度确定性策略梯度算法:一项系统综述。
Heliyon. 2024 May 7;10(9):e30697. doi: 10.1016/j.heliyon.2024.e30697. eCollection 2024 May 15.

本文引用的文献

1
Real-Time Hybrid Multi-Sensor Fusion Framework for Perception in Autonomous Vehicles.实时混合多传感器融合框架用于自动驾驶车辆的感知。
Sensors (Basel). 2019 Oct 9;19(20):4357. doi: 10.3390/s19204357.
2
Multi-Stage Hough Space Calculation for Lane Markings Detection via IMU and Vision Fusion.通过惯性测量单元(IMU)与视觉融合进行车道标记检测的多阶段霍夫空间计算
Sensors (Basel). 2019 May 19;19(10):2305. doi: 10.3390/s19102305.
3
Modified A-Star Algorithm for Efficient Coverage Path Planning in Tetris Inspired Self-Reconfigurable Robot with Integrated Laser Sensor.
基于集成激光传感器的 Tetris 启发式自重构机器人的高效覆盖路径规划的改进 A-Star 算法。
Sensors (Basel). 2018 Aug 7;18(8):2585. doi: 10.3390/s18082585.
4
Mastering the game of Go without human knowledge.无需人类知识即可掌握围棋游戏。
Nature. 2017 Oct 18;550(7676):354-359. doi: 10.1038/nature24270.
5
DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs.DeepLab:基于深度卷积网络、空洞卷积和全连接条件随机场的语义图像分割。
IEEE Trans Pattern Anal Mach Intell. 2018 Apr;40(4):834-848. doi: 10.1109/TPAMI.2017.2699184. Epub 2017 Apr 27.
6
A Vehicle Active Safety Model: Vehicle Speed Control Based on Driver Vigilance Detection Using Wearable EEG and Sparse Representation.一种车辆主动安全模型:基于使用可穿戴脑电图和稀疏表示的驾驶员警觉性检测的车速控制
Sensors (Basel). 2016 Feb 19;16(2):242. doi: 10.3390/s16020242.
7
Mastering the game of Go with deep neural networks and tree search.用深度神经网络和树搜索掌握围棋游戏。
Nature. 2016 Jan 28;529(7587):484-9. doi: 10.1038/nature16961.
8
Reducing the dimensionality of data with neural networks.使用神经网络降低数据维度。
Science. 2006 Jul 28;313(5786):504-7. doi: 10.1126/science.1127647.