Department of Industrial Engineering & Management, Ben-Gurion University of the Negev, Israel.
Accid Anal Prev. 2019 Aug;129:350-361. doi: 10.1016/j.aap.2019.04.016. Epub 2019 Jun 12.
While young drivers (YDs) constitute ∼10% of the driver population, their fatality rate in motorcycle accidents is up to three times higher. Thus, we are interested in predicting fatal motorcycle accidents (FMAs), and in identifying their key factors and possible causes. Accurate prediction of YD FMAs from data by risk minimization using the 0/1 loss function (i.e., the ordinary classification accuracy) cannot be guaranteed because these accidents are only ∼1% of all YD motorcycle accidents, and classifiers tend to focus on the majority class of minor accidents at the expense of the minority class of fatal ones. Also, classifiers are usually uninformative (providing no information about the distribution of misclassifications), insensitive to error severity (making no distinction between misclassification of fatal accidents as severe or minor), and limited in identifying key factors. We propose to use an information measure (IM) that jointly maximizes accuracy and information and is sensitive to the error distribution and severity. Using a database of ∼3600 motorcycle accidents, a Bayesian network classifier optimized by IM predicted FMAs better than classifiers maximizing accuracy or other predictive or information measures, and identified fatal accident key factors and causal relations.
虽然年轻司机(YD)占驾驶员总数的 10%左右,但他们在摩托车事故中的死亡率高达三倍。因此,我们有兴趣预测致命的摩托车事故(FMA),并确定其关键因素和可能的原因。使用 0/1 损失函数(即普通分类准确性)通过风险最小化对数据进行的 YD FMA 准确预测不能保证,因为这些事故仅占所有 YD 摩托车事故的 1%左右,分类器往往侧重于少数严重事故类,而牺牲了多数轻微事故类。此外,分类器通常没有信息量(不提供关于错误分类分布的信息),对错误严重程度不敏感(不区分致命事故的错误分类是严重还是轻微),并且在识别关键因素方面能力有限。我们建议使用一种信息度量(IM),该度量可以联合最大化准确性和信息量,并且对误差分布和严重程度敏感。使用大约 3600 起摩托车事故的数据库,通过 IM 优化的贝叶斯网络分类器比最大化准确性或其他预测或信息度量的分类器更好地预测了 FMA,并确定了致命事故的关键因素和因果关系。