Ahmad Numan, Wali Behram, Khattak Asad J
Department of Civil & Environmental Engineering, The University of Tennessee, Knoxville, TN 37996, USA.
Urban Design 4 Health, Inc., 24 Jackie Circle, East Rochester, NY 14612, USA.
J Safety Res. 2023 Feb;84:418-434. doi: 10.1016/j.jsr.2022.12.005. Epub 2022 Dec 14.
This study aims to increase the prediction accuracy of crash frequency on roadway segments that can forecast future safety on roadway facilities. A variety of statistical and machine learning (ML) methods are used to model crash frequency with ML methods generally having a higher prediction accuracy. Recently, heterogeneous ensemble methods (HEM), including "stacking," have emerged as more accurate and robust intelligent techniques providing more reliable and accurate predictions.
This study applies "Stacking" to model crash frequency on five-lane undivided (5 T) segments of urban and suburban arterials. The prediction performance of "Stacking" is compared with parametric statistical models (Poisson and negative binomial) and three state-of-the-art ML techniques (Decision tree, random forest, and gradient boosting), each of which is termed as the base-learner. By employing an optimal weight scheme to combine individual base-learners through stacking, the problem of biased predictions in individual base-learners due to differences in specifications and prediction accuracies is avoided. Data including crash, traffic, and roadway inventory were collected and integrated from 2013 to 2017. The data are split into training (2013-2015), validation (2016), and testing (2017) datasets. After training five individual base-learners using training data, prediction outcomes are obtained for the five base-learners using validation data that are then used to train a meta-learner.
Results of statistical models reveal that crashes increase with the density (number per mile) of commercial driveways whereas decrease with average offset distance to fixed objects. Individual ML methods show similar results - in terms of variable importance. A comparison of out-of-sample predictions of various models or methods confirms the superiority of "Stacking" over the alternative methods considered.
From a practical standpoint, "stacking" can enhance prediction accuracy (compared to only one base-learner with a particular specification). When applied systemically, stacking can help identify more appropriate countermeasures.
本研究旨在提高道路路段碰撞频率的预测准确性,从而能够预测道路设施未来的安全性。研究使用了多种统计和机器学习(ML)方法对碰撞频率进行建模,其中ML方法通常具有更高的预测准确性。最近,包括“堆叠”在内的异构集成方法(HEM)已成为更准确、更强大的智能技术,能够提供更可靠、更准确的预测。
本研究应用“堆叠”对城市和郊区干道的五车道无分隔(5T)路段的碰撞频率进行建模。将“堆叠”的预测性能与参数统计模型(泊松和负二项式)以及三种先进的ML技术(决策树、随机森林和梯度提升)进行比较,每种技术都被称为基学习器。通过采用最优权重方案,通过堆叠组合各个基学习器,避免了由于规格和预测准确性差异导致的单个基学习器中预测偏差的问题。收集并整合了2013年至2017年的碰撞、交通和道路库存数据。数据被分为训练集(2013 - 2015年)、验证集(2016年)和测试集(2017年)。使用训练数据训练五个单独的基学习器后,使用验证数据获得五个基学习器的预测结果,然后将这些结果用于训练一个元学习器。
统计模型的结果表明,碰撞次数随着商业车道密度(每英里数量)的增加而增加,而随着与固定物体的平均偏移距离的增加而减少。各个ML方法在变量重要性方面显示出类似的结果。各种模型或方法的样本外预测比较证实了“堆叠”相对于所考虑的替代方法的优越性。
从实际角度来看,“堆叠”可以提高预测准确性(与仅具有特定规格的一个基学习器相比)。当系统应用时,堆叠有助于确定更合适的对策。