Jiangsu Key Laboratory of Urban ITS, Southeast University, Nanjing, 210096, China.
Intelligent Transportation Research Center, Southeast University, Nanjing, 210096, China.
Int J Environ Res Public Health. 2019 Jan 25;16(3):334. doi: 10.3390/ijerph16030334.
The objective of this paper is to predict the future driving risk of crash-involved drivers in Kunshan, China. A systematic machine learning framework is proposed to deal with three critical technical issues: 1. defining driving risk; 2. developing risky driving factors; 3. developing a reliable and explicable machine learning model. High-risk (HR) and low-risk (LR) drivers were defined by five different scenarios. A number of features were extracted from seven-year crash/violation records. Drivers' two-year prior crash/violation information was used to predict their driving risk in the subsequent two years. Using a one-year rolling time window, prediction models were developed for four consecutive time periods: 2013⁻2014, 2014⁻2015, 2015⁻2016, and 2016⁻2017. Four tree-based ensemble learning techniques were attempted, including random forest (RF), Adaboost with decision tree, gradient boosting decision tree (GBDT), and extreme gradient boosting decision tree (XGboost). A temporal transferability test and a follow-up study were applied to validate the trained models. The best scenario defining driving risk was multi-dimensional, encompassing crash recurrence, severity, and fault commitment. GBDT appeared to be the best model choice across all time periods, with an acceptable average precision (AP) of 0.68 on the most recent datasets (i.e., 2016⁻2017). Seven of nine top features were related to risky driving behaviors, which presented non-linear relationships with driving risk. Model transferability held within relatively short time intervals (1⁻2 years). Appropriate risk definition, complicated violation/crash features, and advanced machine learning techniques need to be considered for risk prediction task. The proposed machine learning approach is promising, so that safety interventions can be launched more effectively.
本文旨在预测中国昆山事故涉及驾驶员的未来驾驶风险。提出了一个系统的机器学习框架来处理三个关键技术问题:1. 定义驾驶风险;2. 开发危险驾驶因素;3. 开发可靠和可解释的机器学习模型。通过五种不同的情景定义了高风险(HR)和低风险(LR)驾驶员。从七年的事故/违规记录中提取了大量特征。利用驾驶员两年前的事故/违规信息来预测他们在随后两年的驾驶风险。使用一年的滚动时间窗口,为四个连续时间段(2013-2014、2014-2015、2015-2016 和 2016-2017)开发了预测模型。尝试了四种基于树的集成学习技术,包括随机森林(RF)、带决策树的 Adaboost、梯度提升决策树(GBDT)和极端梯度提升决策树(XGboost)。进行了时间迁移性测试和后续研究,以验证训练模型。定义驾驶风险的最佳情景是多维的,包括事故复发、严重程度和过错责任。GBDT 在所有时间段似乎都是最佳模型选择,在最近的数据集(即 2016-2017 年)上的平均精度(AP)为 0.68。九个顶级特征中有七个与危险驾驶行为有关,这些特征与驾驶风险呈非线性关系。模型的可转移性在较短的时间间隔内保持(1-2 年)。风险预测任务需要考虑适当的风险定义、复杂的违规/事故特征和先进的机器学习技术。所提出的机器学习方法很有前途,可以更有效地开展安全干预措施。