Zhao Rui, Hu Haofeng, Li Yun, Fan Yuze, Gao Fei, Gao Zhenhai
College of Automotive Engineering, Jilin University, Changchun 130025, China.
Graduate School of Information and Science Technology, The University of Tokyo, Tokyo 113-8654, Japan.
Sensors (Basel). 2024 Sep 25;24(19):6202. doi: 10.3390/s24196202.
Urban traffic congestion poses significant economic and environmental challenges worldwide. To mitigate these issues, Adaptive Traffic Signal Control (ATSC) has emerged as a promising solution. Recent advancements in deep reinforcement learning (DRL) have further enhanced ATSC's capabilities. This paper introduces a novel DRL-based ATSC approach named the Sequence Decision Transformer (SDT), employing DRL enhanced with attention mechanisms and leveraging the robust capabilities of sequence decision models, akin to those used in advanced natural language processing, adapted here to tackle the complexities of urban traffic management. Firstly, the ATSC problem is modeled as a Markov Decision Process (MDP), with the observation space, action space, and reward function carefully defined. Subsequently, we propose SDT, specifically tailored to solve the MDP problem. The SDT model uses a transformer-based architecture with an encoder and decoder in an actor-critic structure. The encoder processes observations and outputs, both encoded data for the decoder, and value estimates for parameter updates. The decoder, as the policy network, outputs the agent's actions. Proximal Policy Optimization (PPO) is used to update the policy network based on historical data, enhancing decision-making in ATSC. This approach significantly reduces training times, effectively manages larger observation spaces, captures dynamic changes in traffic conditions more accurately, and enhances traffic throughput. Finally, the SDT model is trained and evaluated in synthetic scenarios by comparing the number of vehicles, average speed, and queue length against three baselines, including PPO, a DQN tailored for ATSC, and FRAP, a state-of-the-art ATSC algorithm. SDT shows improvements of 26.8%, 150%, and 21.7% over traditional ATSC algorithms, and 18%, 30%, and 15.6% over the FRAP. This research underscores the potential of integrating Large Language Models (LLMs) with DRL for traffic management, offering a promising solution to urban congestion.
城市交通拥堵在全球范围内带来了重大的经济和环境挑战。为了缓解这些问题,自适应交通信号控制(ATSC)已成为一种有前景的解决方案。深度强化学习(DRL)的最新进展进一步提升了ATSC的能力。本文介绍了一种名为序列决策变压器(SDT)的基于DRL的新型ATSC方法,该方法采用了注意力机制增强的DRL,并利用了序列决策模型的强大能力,类似于高级自然语言处理中使用的模型,在此处进行了调整以应对城市交通管理的复杂性。首先,将ATSC问题建模为马尔可夫决策过程(MDP),并仔细定义了观察空间、动作空间和奖励函数。随后,我们提出了专门用于解决MDP问题的SDT。SDT模型采用基于变压器的架构,在演员-评论家结构中具有编码器和解码器。编码器处理观察结果并输出,既为解码器输出编码数据,也为参数更新输出价值估计。解码器作为策略网络,输出智能体的动作。近端策略优化(PPO)用于基于历史数据更新策略网络,增强ATSC中的决策制定。这种方法显著减少了训练时间,有效管理了更大的观察空间,更准确地捕捉交通状况的动态变化,并提高了交通吞吐量。最后,通过将车辆数量、平均速度和队列长度与三个基线进行比较,在合成场景中对SDT模型进行训练和评估,这三个基线包括PPO、专为ATSC定制的深度Q网络(DQN)以及最先进的ATSC算法FRAP。与传统ATSC算法相比,SDT分别提高了26.8%、150%和21.7%,与FRAP相比分别提高了18%、30%和15.6%。这项研究强调了将大语言模型(LLMs)与DRL集成用于交通管理的潜力,为城市拥堵提供了一个有前景的解决方案。