School of Transportation and Logistics, Southwest Jiaotong University, Chengdu, China.
National Engineering Laboratory of Integrated Transportation Big Data Application Technology, Chengdu, China.
Int J Inj Contr Saf Promot. 2020 Sep;27(3):266-275. doi: 10.1080/17457300.2020.1746814. Epub 2020 Apr 1.
The quality of vehicular collision data is crucial for studying the relationship between injury severity and collision factors. Misclassified injury severity data in the crash dataset, however, may cause inaccurate parameter estimates and consequently lead to biased conclusions and poorly designed countermeasures. This is particularly true for imbalanced data where the number of samples in one class far outnumber the other. To improve the classification performance of the injury severity, the paper presents a robust noise filtering technique to deal with the mislabels in the imbalanced crash dataset using the advanced machine learning algorithms. We examine the state-of-the-art filtering algorithms, including Iterative Noise Filtering based on the Fusion of Classifiers (INFFC), Iterative Partitioning Filter (IPF), and Saturation Filter (SatF). In the case study of Cairo (Egypt), the empirical results show that: (1) the mislabels in crash data significantly influence the injury severity predictions, and (2) the proposed M-IPF filter outperforms its counterparts in terms of the effectiveness and efficiency in eliminating the mislabels in crash data. The test results demonstrate the efficacy of the M-IPF in handling the data noise and mitigating the impacts thereof.
车辆碰撞数据的质量对于研究伤害严重程度与碰撞因素之间的关系至关重要。然而,碰撞数据集中伤害严重程度分类错误的数据可能导致参数估计不准确,从而导致有偏差的结论和设计不佳的对策。对于不平衡数据来说尤其如此,其中一类样本的数量远远超过另一类。为了提高伤害严重程度的分类性能,本文提出了一种稳健的噪声过滤技术,使用先进的机器学习算法来处理不平衡碰撞数据集中的错误标签。我们研究了最先进的过滤算法,包括基于分类器融合的迭代噪声过滤(INFFC)、迭代分区过滤(IPF)和饱和过滤(SatF)。在开罗(埃及)的案例研究中,实证结果表明:(1)碰撞数据中的错误标签会显著影响伤害严重程度的预测;(2)在消除碰撞数据中的错误标签方面,所提出的 M-IPF 过滤器在有效性和效率方面均优于其对应算法。测试结果证明了 M-IPF 在处理数据噪声和减轻其影响方面的有效性。