基于增强深度强化学习的自适应城市交通信号控制

Adaptive urban traffic signal control based on enhanced deep reinforcement learning.

作者信息

Cai Changjian, Wei Min

机构信息

School of Electronic Engineering, Xi'an Shiyou University, Xi'an, 710065, Shaanxi, China.

出版信息

Sci Rep. 2024 Jun 19;14(1):14116. doi: 10.1038/s41598-024-64885-w.

DOI:10.1038/s41598-024-64885-w

PMID:38898047

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11186829/

Abstract

One of the focal points in the field of intelligent transportation is the intelligent control of traffic signals (TS), aimed at enhancing the efficiency of urban road networks through specific algorithms. Deep Reinforcement Learning (DRL) algorithms have become mainstream, yet they suffer from inefficient training sample selection, leading to slow convergence. Additionally, enhancing model robustness is crucial for adapting to diverse traffic conditions. Hence, this paper proposes an enhanced method for traffic signal control (TSC) based on DRL. This approach utilizes dueling network and double q-learning to alleviate the overestimation issue of DRL. Additionally, it introduces a priority sampling mechanism to enhance the utilization efficiency of samples in memory. Moreover, noise parameters are integrated into the neural network model during training to bolster its robustness. By representing high-dimensional real-time traffic information as matrices, and employing a phase-cycled action space to guide the decision-making of intelligent agents. Additionally, utilizing a reward function that closely mirrors real-world scenarios to guide model training. Experimental results demonstrate faster convergence and optimal performance in metrics such as queue length and waiting time. Testing experiments further validate the method's robustness across different traffic flow scenarios.

摘要

智能交通领域的一个焦点是交通信号（TS）的智能控制，旨在通过特定算法提高城市道路网络的效率。深度强化学习（DRL）算法已成为主流，但它们存在训练样本选择效率低下的问题，导致收敛速度缓慢。此外，增强模型的鲁棒性对于适应各种交通状况至关重要。因此，本文提出了一种基于DRL的交通信号控制（TSC）增强方法。该方法利用对决网络和双Q学习来缓解DRL的高估问题。此外，它引入了一种优先采样机制，以提高内存中样本的利用效率。此外，在训练过程中将噪声参数集成到神经网络模型中，以增强其鲁棒性。通过将高维实时交通信息表示为矩阵，并采用相位循环动作空间来指导智能体的决策。此外，利用一个紧密反映现实世界场景的奖励函数来指导模型训练。实验结果表明，在队列长度和等待时间等指标上收敛速度更快且性能最优。测试实验进一步验证了该方法在不同交通流场景下的鲁棒性。