Wan Ming, Wu Qian, Yan Lixin, Guo Junhua, Li Wenxia, Lin Wei, Lu Shan
School of Transportation Engineering, East China Jiaotong University, Nanchang, China.
Traffic Administration Bureau of Nanchang Public Security Bureau, Nanchang, China.
Traffic Inj Prev. 2023;24(4):362-370. doi: 10.1080/15389588.2023.2191286. Epub 2023 Mar 28.
To effectively explore the impacts of several key factors on taxi drivers' traffic violations and provide traffic management departments with scientific decisions to reduce traffic fatalities and injuries.
43,458 electronic enforcement data about taxi drivers' traffic violations in Nanchang City, Jiangxi Province, China, from July 1, 2020, to June 30, 2021, were utilized to explore the characteristics of traffic violations. A random forest algorithm was used to predict the severity of taxi drivers' traffic violations and 11 factors affecting traffic violations, including time, road conditions, environment, and taxi companies were analyzed using the Shapley Additionality Explanation (SHAP) framework.
Firstly, the ensemble method Balanced Bagging Classifier (BBC) was applied to balance the dataset. The results showed that the imbalance ratio (IR) of the original imbalanced dataset reduced from 6.61% to 2.60%. Moreover, a prediction model for the severity of taxi drivers' traffic violations was established by using the Random Forest, and the results showed that accuracy, m_F1, m_G-mean, m_AUC, and m_AP obtained 0.877, 0.849, 0.599, 0.976, and 0.957, respectively. Compared with the algorithms of Decision Tree, XG Boost, Ada Boost, and Neural Network, the performance measures of the prediction model based on Random Forest were the best. Finally, the SHAP framework was used to improve the interpretability of the model and identify important factors affecting taxi drivers' traffic violations. The results showed that functional districts, location of the violation, and road grade were found to have a high impact on the probability of traffic violations; their mean SHAP values were 0.39, 0.36, and 0.26, respectively.
Findings of this paper may help to discover the relationship between the influencing factors and the severity of traffic violations, and provide a theoretical basis for reducing the traffic violations of taxi drivers and improving the road safety management.
有效探究若干关键因素对出租车司机交通违法行为的影响,为交通管理部门提供科学决策,以减少交通伤亡事故。
利用2020年7月1日至2021年6月30日期间中国江西省南昌市43458条关于出租车司机交通违法行为的电子执法数据,探究交通违法行为的特征。采用随机森林算法预测出租车司机交通违法行为的严重程度,并使用夏普利附加解释(SHAP)框架分析影响交通违法行为的11个因素,包括时间、路况、环境和出租车公司。
首先,应用集成方法平衡袋装分类器(BBC)对数据集进行平衡。结果表明,原始不平衡数据集的不平衡率(IR)从6.61%降至2.60%。此外,利用随机森林建立了出租车司机交通违法行为严重程度的预测模型,结果显示准确率、m_F1、m_G均值、m_AUC和m_AP分别为0.877、0.849、0.599、0.976和0.957。与决策树、XG Boost、Ada Boost和神经网络算法相比,基于随机森林的预测模型的性能指标最佳。最后,使用SHAP框架提高模型的可解释性,并识别影响出租车司机交通违法行为的重要因素。结果表明,功能区、违法行为发生地点和道路等级对交通违法概率有较高影响;它们的平均SHAP值分别为0.39、0.36和0.26。
本文的研究结果可能有助于发现影响因素与交通违法行为严重程度之间的关系,为减少出租车司机的交通违法行为和改善道路安全管理提供理论依据。