奖励函数对交通信号控制强化学习的影响分析。

Effects analysis of reward functions on reinforcement learning for traffic signal control.

机构信息

Department of Transportation Engineering, University of Seoul, Seoul, Korea.

Civil and Environmental Engineering, University of Windsor, Windsor, Canada.

出版信息

PLoS One. 2022 Nov 21;17(11):e0277813. doi: 10.1371/journal.pone.0277813. eCollection 2022.

DOI:10.1371/journal.pone.0277813

PMID:36409713

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9678263/

Abstract

The increasing traffic demand in urban areas frequently causes traffic congestion, which can be managed only through intelligent traffic signal controls. Although many recent studies have focused on reinforcement learning for traffic signal control (RL-TSC), most have focused on improving performance from an intersection perspective, targeting virtual simulation. The performance indexes from intersection perspectives are averaged by the weighted traffic flow; therefore, if the balance of each movement is not considered, the green time may be overly concentrated on the movements of heavy flow rates. Furthermore, as the ultimate purpose of traffic signal control research is to apply these controls to the real-world intersections, it is necessary to consider the real-world constraints. Hence, this study aims to design RL-TSC considering real-world applicability and confirm the appropriate design of the reward function. The limitations of the detector in the real world and the dual-ring traffic signal system are taken into account in the model design to facilitate real-world application. To design the reward for balancing traffic movements, we define the average delay weighted by traffic volume per lane and entropy of delay in the reward function. Model training is performed at the prototype intersection for ensuring scalability to multiple intersections. The model after prototype pre-training is evaluated by applying it to a network with two intersections without additional training. As a result, the reward function considering the equality of traffic movements shows the best performance. The proposed model reduces the average delay by more than 7.4% and 15.0% compared to the existing real-time adaptive signal control at two intersections, respectively.

摘要

城市地区不断增长的交通需求经常导致交通拥堵，只能通过智能交通信号控制来管理。尽管最近的许多研究都集中在交通信号控制的强化学习（RL-TSC）上，但大多数研究都侧重于从交叉口的角度提高性能，针对虚拟仿真。从交叉口角度的性能指标通过加权交通流量平均分配；因此，如果不考虑每个运动的平衡，绿灯时间可能过于集中在高流量率的运动上。此外，由于交通信号控制研究的最终目的是将这些控制应用于现实世界的交叉口，因此需要考虑现实世界的约束条件。因此，本研究旨在设计考虑现实世界适用性的 RL-TSC，并确认奖励功能的适当设计。在模型设计中考虑了现实世界中探测器的局限性和双环交通信号系统，以方便现实世界的应用。为了设计平衡交通运动的奖励，我们在奖励函数中定义了按每个车道的交通量加权的平均延迟和延迟熵。通过在原型交叉口进行模型训练，以确保可扩展到多个交叉口。在原型预训练后，通过将其应用于没有额外训练的两个交叉口的网络来评估模型。结果表明，考虑交通运动平等性的奖励函数表现出最佳性能。与两个交叉口的现有实时自适应信号控制相比，所提出的模型分别减少了平均延迟 7.4%和 15.0%。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

奖励函数对交通信号控制强化学习的影响分析。

Effects analysis of reward functions on reinforcement learning for traffic signal control.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

奖励函数对交通信号控制强化学习的影响分析。

Effects analysis of reward functions on reinforcement learning for traffic signal control.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献