Li Jinhong, Liu Jinli, Liu Pengfei, Qi Yi
School of Mathematics and Statistics, Qilu University of Technology (Shandong Academy of Sciences), University Road 3501, Jinan 250353, China.
Department of Transportation Studies, Texas Southern University, 3100 Cleburne Street, Houston, TX 77004-9986, USA.
Entropy (Basel). 2020 Oct 22;22(11):1191. doi: 10.3390/e22111191.
Crashes that involved large trucks often result in immense human, economic, and social losses. To prevent and mitigate severe large truck crashes, factors contributing to the severity of these crashes need to be identified before appropriate countermeasures can be explored. In this research, we applied three tree-based machine learning (ML) techniques, i.e., random forest (RF), gradient boost decision tree (GBDT), and adaptive boosting (AdaBoost), to analyze the factors contributing to the severity of large truck crashes. Besides, a mixed logit model was developed as a baseline model to compare with the factors identified by the ML models. The analysis was performed based on the crash data collected from the Texas Crash Records Information System (CRIS) from 2011 to 2015. The results of this research demonstrated that the GBDT model outperforms other ML methods in terms of its prediction accuracy and its capability in identifying more contributing factors that were also identified by the mixed logit model as significant factors. Besides, the GBDT method can effectively identify both categorical and numerical factors, and the directions and magnitudes of the impacts of the factors identified by the GBDT model are all reasonable and explainable. Among the identified factors, driving under the influence of drugs, alcohol, and fatigue are the most important factors contributing to the severity of large truck crashes. In addition, the exists of curbs and medians and lanes and shoulders with sufficient width can prevent severe large truck crashes.
涉及大型卡车的撞车事故往往会造成巨大的人员、经济和社会损失。为了预防和减轻严重的大型卡车撞车事故,在探索适当的应对措施之前,需要确定导致这些撞车事故严重程度的因素。在本研究中,我们应用了三种基于树的机器学习(ML)技术,即随机森林(RF)、梯度提升决策树(GBDT)和自适应提升(AdaBoost),来分析导致大型卡车撞车事故严重程度的因素。此外,还开发了一个混合逻辑模型作为基线模型,以便与ML模型识别出的因素进行比较。分析是基于从德克萨斯州撞车记录信息系统(CRIS)收集的2011年至年的撞车数据进行的。本研究结果表明,GBDT模型在预测准确性以及识别更多影响因素的能力方面优于其他ML方法,这些因素也被混合逻辑模型识别为显著因素。此外,GBDT方法可以有效地识别分类因素和数值因素,并且GBDT模型识别出的因素的影响方向和大小都是合理且可解释的。在识别出的因素中,在药物、酒精和疲劳影响下驾驶是导致大型卡车撞车事故严重程度的最重要因素。此外,路缘石、中央分隔带以及宽度足够的车道和路肩的存在可以预防严重的大型卡车撞车事故。