IEEE Trans Cybern. 2022 Jun;52(6):5267-5277. doi: 10.1109/TCYB.2020.3029077. Epub 2022 Jun 16.
Through vehicle-to-vehicle (V2V) communication, both human-driven and autonomous vehicles can actively exchange data, such as velocities and bumper-to-bumper distances. Employing the shared data, control laws with improved performance can be designed for connected and autonomous vehicles (CAVs). In this article, taking into account human-vehicle interaction and heterogeneous driver behavior, an adaptive optimal control design method is proposed for a platoon mixed with multiple preceding human-driven vehicles and one CAV at the tail. It is shown that by using reinforcement learning and adaptive dynamic programming techniques, a near-optimal controller can be learned from real-time data for the CAV with V2V communications, but without the precise knowledge of the accurate car-following parameters of any driver in the platoon. The proposed method allows the CAV controller to adapt to different platoon dynamics caused by the unknown and heterogeneous driver-dependent parameters. To improve the safety performance during the learning process, our off-policy learning algorithm can leverage both the historical data and the data collected in real time, which leads to considerably reduced learning time duration. The effectiveness and efficiency of our proposed method is demonstrated by rigorous proofs and microscopic traffic simulations.
通过车对车(V2V)通信,无论是人类驾驶的车辆还是自动驾驶的车辆都可以主动交换数据,例如速度和车距。利用共享数据,可以为联网和自动驾驶车辆(CAV)设计具有改进性能的控制律。在本文中,考虑到人车交互和异质驾驶员行为,提出了一种用于混合多辆前车和一辆尾部 CAV 的车队的自适应最优控制设计方法。结果表明,通过使用强化学习和自适应动态规划技术,可以从具有 V2V 通信的 CAV 的实时数据中学习到近最优控制器,而无需精确了解车队中任何驾驶员的精确跟驰参数。所提出的方法允许 CAV 控制器适应由未知和异质驾驶员相关参数引起的不同的车队动力学。为了在学习过程中提高安全性,我们的非策略学习算法可以利用历史数据和实时收集的数据,从而大大缩短学习时间。通过严格的证明和微观交通模拟,验证了我们所提出的方法的有效性和效率。