Lin Chuang-Chieh, Ho Ming-Chu, Hung Chih-Chieh, Hsu Hui-Huang
Department of Computer Science and Engineering, National Taiwan Ocean University, Keelung City, 202301, Taiwan.
Department of Management Information Systems, National Chung Hsing University, Taichung City, 402202, Taiwan.
Sci Rep. 2025 Jul 15;15(1):25609. doi: 10.1038/s41598-025-02303-5.
Accurate travel time prediction (TTP) is essential to freeway users, including drivers, administrators, and freight-related companies, for enabling them to plan trips effectively and mitigate traffic congestion. However, TTP is a complex challenge even for researchers due to the difficulty of capturing numerous and diverse factors such as driver behaviors, rush hours, special events, and traffic incidents, etc. A multitude of studies have proposed methods to address this issue, yet these approaches often involve multiple stages and steps, including data preprocessing, feature selection, data imputation, prediction model. The intricacy of these processes makes it difficult to pinpoint which steps or factors most significantly influence prediction accuracy. In this paper, we investigate the impact of various steps on TTP accuracy by examining existing methods. Beginning with the data pre-processing phase, we evaluate the effect of deep learning, interpolation, and max value imputation techniques on dealing with missing values. We also examine the influence of temporal features and weather conditions on the prediction accuracy. Furthermore, we compare five distinct hybrid models by assessing their strengths and limitations. To ensure our experiments align with real-world situations well, we conduct experiments using datasets from Taiwan and California. The experimental results reveal that the data-preprocessing phase, including feature editing, plays a pivotal role in TTP accuracy. Additionally, base models such as Long Short-Term Memory (LSTM) and eXtreme Gradient Boosting (XGBoost) outperform all hybrid models on real-world datasets. Based on these insights, we propose a baseline that fuses the complementary strengths of XGBoost and LSTM via a gating network. This approach dynamically allocates weights, guided by key statistical features, to each model, enabling the model to robustly adapt to both stable and volatile traffic conditions and achieve superior prediction accuracy compared to existing methods. By breaking down the TTP process and analyzing each component, this study provides insights into the factors which affect prediction accuracy most significantly, thereby offering guidance and foundation for developing more effective TTP methods in the future.
准确的行程时间预测(TTP)对高速公路用户至关重要,这些用户包括司机、管理人员以及与货运相关的公司,能使他们有效地规划行程并缓解交通拥堵。然而,由于难以捕捉众多不同因素,如驾驶员行为、高峰时段、特殊活动和交通事故等,即使对于研究人员而言,TTP仍是一项复杂的挑战。众多研究提出了应对此问题的方法,但这些方法通常涉及多个阶段和步骤,包括数据预处理、特征选择、数据插补、预测模型。这些过程的复杂性使得难以确定哪些步骤或因素对预测准确性影响最为显著。在本文中,我们通过研究现有方法来探究各个步骤对TTP准确性的影响。从数据预处理阶段开始,我们评估深度学习、插值和最大值插补技术在处理缺失值方面的效果。我们还研究了时间特征和天气条件对预测准确性的影响。此外,我们通过评估五种不同的混合模型的优缺点来进行比较。为确保我们的实验与现实情况良好契合,我们使用来自台湾和加利福尼亚的数据集进行实验。实验结果表明,包括特征编辑在内的数据预处理阶段在TTP准确性中起着关键作用。此外,在真实世界数据集上,诸如长短期记忆(LSTM)和极端梯度提升(XGBoost)等基础模型优于所有混合模型。基于这些见解,我们提出了一种基线方法,即通过门控网络融合XGBoost和LSTM的互补优势。这种方法在关键统计特征的引导下,为每个模型动态分配权重,使模型能够稳健地适应稳定和波动的交通状况,并实现比现有方法更高的预测准确性。通过分解TTP过程并分析每个组件,本研究深入了解了对预测准确性影响最显著的因素,从而为未来开发更有效的TTP方法提供指导和基础。