Department of Civil and Environmental Engineering, Florida International University, 10555 West Flagler Street, EC 3680, Miami, FL 33174, United States.
Accid Anal Prev. 2015 Jun;79:133-44. doi: 10.1016/j.aap.2015.03.011. Epub 2015 Mar 28.
The Highway Safety Manual (HSM) recommends using the empirical Bayes (EB) method with locally derived calibration factors to predict an agency's safety performance. However, the data needs for deriving these local calibration factors are significant, requiring very detailed roadway characteristics information. Many of the data variables identified in the HSM are currently unavailable in the states' databases. Moreover, the process of collecting and maintaining all the HSM data variables is cost-prohibitive. Prioritization of the variables based on their impact on crash predictions would, therefore, help to identify influential variables for which data could be collected and maintained for continued updates. This study aims to determine the impact of each independent variable identified in the HSM on crash predictions. A relatively recent data mining approach called boosted regression trees (BRT) is used to investigate the association between the variables and crash predictions. The BRT method can effectively handle different types of predictor variables, identify very complex and non-linear association among variables, and compute variable importance. Five years of crash data from 2008 to 2012 on two urban and suburban facility types, two-lane undivided arterials and four-lane divided arterials, were analyzed for estimating the influence of variables on crash predictions. Variables were found to exhibit non-linear and sometimes complex relationship to predicted crash counts. In addition, only a few variables were found to explain most of the variation in the crash data.
《公路安全手册》(HSM)建议使用经验贝叶斯(EB)方法和本地导出的校准因素来预测机构的安全性能。然而,导出这些本地校准因素所需的数据非常重要,需要非常详细的道路特征信息。HSM 中确定的许多数据变量目前在各州的数据库中不可用。此外,收集和维护所有 HSM 数据变量的过程成本过高。因此,根据变量对碰撞预测的影响对变量进行优先级排序将有助于确定可以收集和维护数据以进行持续更新的有影响力的变量。本研究旨在确定 HSM 中确定的每个独立变量对碰撞预测的影响。使用一种称为增强回归树(BRT)的相对较新的数据挖掘方法来研究变量与碰撞预测之间的关系。BRT 方法可以有效地处理不同类型的预测变量,识别变量之间非常复杂和非线性的关系,并计算变量的重要性。对 2008 年至 2012 年的五年城市和郊区两种设施类型(双车道未分隔干道和四车道分隔干道)的碰撞数据进行了分析,以估算变量对碰撞预测的影响。发现变量与预测碰撞次数之间存在非线性且有时复杂的关系。此外,只发现了少数几个变量可以解释大部分碰撞数据的变化。