Zhao Rui, Wang Kui, Che Wenbo, Li Yun, Fan Yuze, Gao Fei
College of Automotive Engineering, Jilin University, Changchun 130025, China.
School of Mechanical Engineering, Beijing Institute of Technology, Beijing 100081, China.
Sensors (Basel). 2024 Apr 22;24(8):2657. doi: 10.3390/s24082657.
Adaptive cruise control (ACC) enables efficient, safe, and intelligent vehicle control by autonomously adjusting speed and ensuring a safe following distance from the vehicle in front. This paper proposes a novel adaptive cruise system, namely the Safety-First Reinforcement Learning Adaptive Cruise Control (SFRL-ACC). This system aims to leverage the model-free nature and high real-time inference efficiency of Deep Reinforcement Learning (DRL) to overcome the challenges of modeling difficulties and lower computational efficiency faced by current optimization control-based ACC methods while simultaneously maintaining safety advantages and optimizing ride comfort. Firstly, we transform the ACC problem into a safe DRL formulation Constrained Markov Decision Process (CMDP) by carefully designing state, action, reward, and cost functions. Subsequently, we propose the Projected Constrained Policy Optimization (PCPO)-based ACC Algorithm SFRL-ACC, which is specifically tailored to solve the CMDP problem. PCPO incorporates safety constraints that further restrict the trust region formed by the Kullback-Leibler (KL) divergence, facilitating DRL policy updates that maximize performance while keeping safety costs within their limit bounds. Finally, we train an SFRL-ACC policy and compare its computation time, traffic efficiency, ride comfort, and safety with state-of-the-art MPC-based ACC control methods. The experimental results prove the superiority of the proposed method in the aforementioned performance aspects.
自适应巡航控制(ACC)通过自动调整车速并确保与前车保持安全车距,实现高效、安全且智能的车辆控制。本文提出了一种新型自适应巡航系统,即安全至上强化学习自适应巡航控制(SFRL-ACC)。该系统旨在利用深度强化学习(DRL)的无模型特性和高实时推理效率,克服当前基于优化控制的ACC方法所面临的建模困难和计算效率较低的挑战,同时保持安全优势并优化乘坐舒适性。首先,我们通过精心设计状态、动作、奖励和成本函数,将ACC问题转化为安全的DRL公式约束马尔可夫决策过程(CMDP)。随后,我们提出了基于投影约束策略优化(PCPO)的ACC算法SFRL-ACC,该算法专门用于解决CMDP问题。PCPO纳入了安全约束,进一步限制了由库尔贝克-莱布勒(KL)散度形成的信任区域,有助于DRL策略更新,在将安全成本控制在其界限范围内的同时最大化性能。最后,我们训练了一个SFRL-ACC策略,并将其计算时间、交通效率、乘坐舒适性和安全性与基于最先进模型预测控制(MPC)的ACC控制方法进行比较。实验结果证明了所提方法在上述性能方面的优越性。