Department of Civil Engineering, The Hashemite University, 13115 Zarqa, Jordan.
Department of Civil Engineering, University of Granada, ETSI Caminos, Canales y Puertos, c/ Severo Ochoa, s/n, 18071 Granada, Spain.
Accid Anal Prev. 2016 Mar;88:37-51. doi: 10.1016/j.aap.2015.12.003. Epub 2015 Dec 20.
Traffic accidents data sets are usually imbalanced, where the number of instances classified under the killed or severe injuries class (minority) is much lower than those classified under the slight injuries class (majority). This, however, supposes a challenging problem for classification algorithms and may cause obtaining a model that well cover the slight injuries instances whereas the killed or severe injuries instances are misclassified frequently. Based on traffic accidents data collected on urban and suburban roads in Jordan for three years (2009-2011); three different data balancing techniques were used: under-sampling which removes some instances of the majority class, oversampling which creates new instances of the minority class and a mix technique that combines both. In addition, different Bayes classifiers were compared for the different imbalanced and balanced data sets: Averaged One-Dependence Estimators, Weightily Average One-Dependence Estimators, and Bayesian networks in order to identify factors that affect the severity of an accident. The results indicated that using the balanced data sets, especially those created using oversampling techniques, with Bayesian networks improved classifying a traffic accident according to its severity and reduced the misclassification of killed and severe injuries instances. On the other hand, the following variables were found to contribute to the occurrence of a killed causality or a severe injury in a traffic accident: number of vehicles involved, accident pattern, number of directions, accident type, lighting, surface condition, and speed limit. This work, to the knowledge of the authors, is the first that aims at analyzing historical data records for traffic accidents occurring in Jordan and the first to apply balancing techniques to analyze injury severity of traffic accidents.
交通事故数据集通常是不平衡的,其中属于死亡或重伤类(少数)的实例数量远低于属于轻伤类(多数)的实例数量。然而,这对于分类算法来说是一个具有挑战性的问题,可能导致获得一个模型,该模型能够很好地涵盖轻伤实例,而重伤或死亡实例则经常被错误分类。基于在约旦城市和郊区道路上收集的三年(2009-2011 年)交通事故数据;使用了三种不同的数据平衡技术:欠采样,即删除多数类的一些实例;过采样,即创建少数类的新实例;以及混合技术,将两者结合起来。此外,还比较了不同的贝叶斯分类器,用于不同的不平衡和平衡数据集:平均单依赖估计器、加权平均单依赖估计器和贝叶斯网络,以确定影响事故严重程度的因素。结果表明,使用平衡数据集,特别是使用过采样技术创建的平衡数据集,以及贝叶斯网络,可以根据事故的严重程度提高分类能力,并减少死亡和重伤实例的错误分类。另一方面,发现以下变量对交通事故中发生死亡或重伤的可能性有影响:涉及的车辆数量、事故模式、方向数量、事故类型、照明、路面状况和限速。就作者所知,这项工作旨在分析约旦发生的交通事故的历史数据记录,并且首次应用平衡技术来分析交通事故的伤害严重程度。