• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于虚拟安全笼的自动驾驶高速公路弱监督强化学习

Weakly Supervised Reinforcement Learning for Autonomous Highway Driving via Virtual Safety Cages.

机构信息

Connected and Autonomous Vehicles Lab, University of Surrey, Guildford GU2 7XH, UK.

Centre for Vision Speech and Signal Processing, University of Surrey, Guildford GU2 7XH, UK.

出版信息

Sensors (Basel). 2021 Mar 13;21(6):2032. doi: 10.3390/s21062032.

DOI:10.3390/s21062032
PMID:33805601
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8001915/
Abstract

The use of neural networks and reinforcement learning has become increasingly popular in autonomous vehicle control. However, the opaqueness of the resulting control policies presents a significant barrier to deploying neural network-based control in autonomous vehicles. In this paper, we present a reinforcement learning based approach to autonomous vehicle longitudinal control, where the rule-based safety cages provide enhanced safety for the vehicle as well as weak supervision to the reinforcement learning agent. By guiding the agent to meaningful states and actions, this weak supervision improves the convergence during training and enhances the safety of the final trained policy. This rule-based supervisory controller has the further advantage of being fully interpretable, thereby enabling traditional validation and verification approaches to ensure the safety of the vehicle. We compare models with and without safety cages, as well as models with optimal and constrained model parameters, and show that the weak supervision consistently improves the safety of exploration, speed of convergence, and model performance. Additionally, we show that when the model parameters are constrained or sub-optimal, the safety cages can enable a model to learn a safe driving policy even when the model could not be trained to drive through reinforcement learning alone.

摘要

神经网络和强化学习在自动驾驶控制中的应用变得越来越流行。然而,由此产生的控制策略的不透明性对在自动驾驶汽车中部署基于神经网络的控制提出了重大挑战。在本文中,我们提出了一种基于强化学习的自动驾驶汽车纵向控制方法,其中基于规则的安全笼为车辆提供了增强的安全性以及对强化学习代理的弱监督。通过引导代理进入有意义的状态和动作,这种弱监督可以在训练过程中提高收敛速度,并增强最终训练策略的安全性。这种基于规则的监督控制器具有完全可解释的优点,从而能够采用传统的验证和确认方法来确保车辆的安全性。我们比较了有无安全笼的模型、具有最优和约束模型参数的模型,并表明弱监督始终可以提高探索的安全性、收敛速度和模型性能。此外,我们还表明,当模型参数受到约束或次优时,安全笼可以使模型即使在仅通过强化学习无法训练出安全驾驶策略的情况下,也能够学习到安全的驾驶策略。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a498/8001915/792cb739753e/sensors-21-02032-g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a498/8001915/e768e09780d8/sensors-21-02032-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a498/8001915/5d82ffc832bd/sensors-21-02032-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a498/8001915/f11559e5f6f1/sensors-21-02032-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a498/8001915/35b30a5593b5/sensors-21-02032-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a498/8001915/656009cd64b7/sensors-21-02032-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a498/8001915/9b16ded9aaba/sensors-21-02032-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a498/8001915/04a8a77fc051/sensors-21-02032-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a498/8001915/a72d4bb534ec/sensors-21-02032-g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a498/8001915/792cb739753e/sensors-21-02032-g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a498/8001915/e768e09780d8/sensors-21-02032-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a498/8001915/5d82ffc832bd/sensors-21-02032-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a498/8001915/f11559e5f6f1/sensors-21-02032-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a498/8001915/35b30a5593b5/sensors-21-02032-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a498/8001915/656009cd64b7/sensors-21-02032-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a498/8001915/9b16ded9aaba/sensors-21-02032-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a498/8001915/04a8a77fc051/sensors-21-02032-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a498/8001915/a72d4bb534ec/sensors-21-02032-g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a498/8001915/792cb739753e/sensors-21-02032-g009.jpg

相似文献

1
Weakly Supervised Reinforcement Learning for Autonomous Highway Driving via Virtual Safety Cages.基于虚拟安全笼的自动驾驶高速公路弱监督强化学习
Sensors (Basel). 2021 Mar 13;21(6):2032. doi: 10.3390/s21062032.
2
Towards Robust Decision-Making for Autonomous Highway Driving Based on Safe Reinforcement Learning.基于安全强化学习的稳健自主高速公路驾驶决策方法
Sensors (Basel). 2024 Jun 26;24(13):4140. doi: 10.3390/s24134140.
3
Dense reinforcement learning for safety validation of autonomous vehicles.密集强化学习在自动驾驶汽车安全验证中的应用。
Nature. 2023 Mar;615(7953):620-627. doi: 10.1038/s41586-023-05732-2. Epub 2023 Mar 22.
4
Double Deep Q-Learning and Faster R-CNN-Based Autonomous Vehicle Navigation and Obstacle Avoidance in Dynamic Environment.基于双深度 Q 学习和 Faster R-CNN 的动态环境下自主车辆导航和避障。
Sensors (Basel). 2021 Feb 20;21(4):1468. doi: 10.3390/s21041468.
5
Generalized Single-Vehicle-Based Graph Reinforcement Learning for Decision-Making in Autonomous Driving.基于广义单车图的强化学习在自动驾驶决策中的应用。
Sensors (Basel). 2022 Jun 29;22(13):4935. doi: 10.3390/s22134935.
6
Research into Autonomous Vehicles Following and Obstacle Avoidance Based on Deep Reinforcement Learning Method under Map Constraints.基于地图约束的深度强化学习方法的自主车辆跟随与避障研究。
Sensors (Basel). 2023 Jan 11;23(2):844. doi: 10.3390/s23020844.
7
Safe Reinforcement Learning With Stability Guarantee for Motion Planning of Autonomous Vehicles.具有稳定性保证的自动驾驶车辆运动规划安全强化学习
IEEE Trans Neural Netw Learn Syst. 2021 Dec;32(12):5435-5444. doi: 10.1109/TNNLS.2021.3084685. Epub 2021 Nov 30.
8
Towards autonomous neuroprosthetic control using Hebbian reinforcement learning.使用赫布强化学习实现自主神经假肢控制。
J Neural Eng. 2013 Dec;10(6):066005. doi: 10.1088/1741-2560/10/6/066005. Epub 2013 Oct 8.
9
Multi-Agent Reinforcement Learning for Traffic Flow Management of Autonomous Vehicles.多智能体强化学习在自动驾驶车辆交通流管理中的应用。
Sensors (Basel). 2023 Feb 21;23(5):2373. doi: 10.3390/s23052373.
10
Design and Implementation of Intelligent Agent Training Systems for Virtual Vehicles.智能虚拟车辆代理训练系统的设计与实现。
Sensors (Basel). 2021 Jan 12;21(2):492. doi: 10.3390/s21020492.

引用本文的文献

1
Survey of Autonomous Vehicles' Collision Avoidance Algorithms.自动驾驶车辆避撞算法综述
Sensors (Basel). 2025 Jan 10;25(2):395. doi: 10.3390/s25020395.
2
A Deep Reinforcement Learning Strategy for Surrounding Vehicles-Based Lane-Keeping Control.一种基于周围车辆的深度强化学习车道保持控制策略
Sensors (Basel). 2023 Dec 15;23(24):9843. doi: 10.3390/s23249843.

本文引用的文献

1
On the Safety of Machine Learning: Cyber-Physical Systems, Decision Sciences, and Data Products.机器学习的安全性:网络物理系统、决策科学和数据产品。
Big Data. 2017 Sep;5(3):246-255.
2
Long short-term memory.长短期记忆
Neural Comput. 1997 Nov 15;9(8):1735-80. doi: 10.1162/neco.1997.9.8.1735.