Suppr
超能文献

重采样方法在处理不平衡碰撞数据中的有效性：碰撞类型分析和预测建模。

Effectiveness of resampling methods in coping with imbalanced crash data: Crash type analysis and predictive modeling.

机构信息

College of Engineering, University of Georgia, Athens, GA 30602, USA.

出版信息

Accid Anal Prev. 2021 Sep;159:106240. doi: 10.1016/j.aap.2021.106240. Epub 2021 Jun 16.

DOI:10.1016/j.aap.2021.106240

PMID:34144225

Abstract

Crash data analysis is commonly subjected to imbalanced data. Varied by facility and control types, some crash types are more frequent than others. However, uncommon crash types are routinely more severe and associated with higher economic and societal costs, and thus crucial to prevent. It is paramount to develop inferential models that can reliably predict crash types and identify attributing factors, especially for the severe types. The current process of modeling towards infrequent events generally disregards disparity in data representation, which can lead to biased models. Therefore, mitigating and managing imbalanced data is essential to the development of meaningful and robust models that help reveal effective countermeasures. This study focuses on comparing the effects of resampling techniques on the performance of both machine learning and classical statistical models for classifying and predicting different crash types on freeways. Specifically, a mixed sampling approach featuring a cluster-based under-sampling coupled with three popular over-sampling methods (i.e., random over-sampling, synthetic minority over-sampling, and adaptive synthetic sampling) were investigated with respect to four crash classification models, including three ensemble machine learning models (CatBoost, XGBoost, and Random Forests) and one classic statistical model (Nested Logit). This study concluded that all three resampling methods consistently enhanced the performance of all models. Among the three over-sampling methods, the adaptive synthetic sampling approach performed best and tremendously improved the prediction of minority crash types without impeding the prediction of the majority crash type. This is likely due to the density-based approach of adaptive synthetic sampling in creating synthetic instances that are more congruent with the underlying manifold structure embodied in the high-dimensional feature space.

摘要

碰撞数据分析通常会受到不平衡数据的影响。由于设施和控制类型的不同，某些碰撞类型比其他类型更为常见。然而，不常见的碰撞类型通常更为严重，且与更高的经济和社会成本相关，因此预防这些类型至关重要。开发能够可靠地预测碰撞类型并识别归因因素的推理模型非常重要，尤其是对于严重类型的碰撞。目前，针对罕见事件的建模过程通常忽略了数据表示中的差异，这可能导致模型存在偏差。因此，缓解和管理不平衡数据对于开发有意义且稳健的模型至关重要，这些模型有助于揭示有效的对策。本研究专注于比较重采样技术对机器学习和经典统计模型在分类和预测高速公路上不同碰撞类型性能的影响。具体而言，采用基于聚类的欠采样与三种流行的过采样方法（即随机过采样、合成少数过采样和自适应合成采样）相结合的混合采样方法，针对四种碰撞分类模型进行了研究，包括三种集成机器学习模型（CatBoost、XGBoost 和随机森林）和一种经典统计模型（嵌套 Logit）。本研究得出结论，所有三种重采样方法都一致地提高了所有模型的性能。在三种过采样方法中，自适应合成采样方法的表现最好，极大地提高了少数碰撞类型的预测准确性，而不会影响多数碰撞类型的预测。这可能是由于自适应合成采样基于密度的方法在创建与高维特征空间中体现的底层流形结构更一致的合成实例方面的优势。

相似文献

Effectiveness of resampling methods in coping with imbalanced crash data: Crash type analysis and predictive modeling.

Accid Anal Prev. 2021 Sep;159:106240. doi: 10.1016/j.aap.2021.106240. Epub 2021 Jun 16.

Crash injury severity prediction considering data imbalance: A Wasserstein generative adversarial network with gradient penalty approach.

Accid Anal Prev. 2023 Nov;192:107271. doi: 10.1016/j.aap.2023.107271. Epub 2023 Aug 31.

Class-imbalanced crash prediction based on real-time traffic and weather data: A driving simulator study.

Traffic Inj Prev. 2020;21(3):201-208. doi: 10.1080/15389588.2020.1723794. Epub 2020 Mar 3.

Comparison of four statistical and machine learning methods for crash severity prediction.

Accid Anal Prev. 2017 Nov;108:27-36. doi: 10.1016/j.aap.2017.08.008. Epub 2017 Sep 6.

A resampling approach to disaggregate analysis of bus-involved crashes using panel data with excessive zeros.

Accid Anal Prev. 2022 Jan;164:106496. doi: 10.1016/j.aap.2021.106496. Epub 2021 Nov 18.

Heterogeneous ensemble learning for enhanced crash forecasts - A frequentist and machine learning based stacking framework.

J Safety Res. 2023 Feb;84:418-434. doi: 10.1016/j.jsr.2022.12.005. Epub 2022 Dec 14.

Examining imbalanced classification algorithms in predicting real-time traffic crash risk.

Accid Anal Prev. 2020 Sep;144:105610. doi: 10.1016/j.aap.2020.105610. Epub 2020 Jun 16.

Applying machine learning approaches to analyze the vulnerable road-users' crashes at statewide traffic analysis zones.

J Safety Res. 2019 Sep;70:275-288. doi: 10.1016/j.jsr.2019.04.008. Epub 2019 May 10.

Injury severity prediction of traffic crashes with ensemble machine learning techniques: a comparative study.

Int J Inj Contr Saf Promot. 2021 Dec;28(4):408-427. doi: 10.1080/17457300.2021.1928233. Epub 2021 Jun 1.

Classification of autonomous vehicle crash severity: Solving the problems of imbalanced datasets and small sample size.

Accid Anal Prev. 2024 Sep;205:107666. doi: 10.1016/j.aap.2024.107666. Epub 2024 Jun 20.

引用本文的文献

AI-based prediction of traffic crash severity for improving road safety and transportation efficiency.

Sci Rep. 2025 Jul 28;15(1):27468. doi: 10.1038/s41598-025-10970-7.

Identifying fatigue of climbing workers using physiological data based on the XGBoost algorithm.

Front Public Health. 2024 Oct 9;12:1462675. doi: 10.3389/fpubh.2024.1462675. eCollection 2024.

Crash severity analysis: A data-enhanced double layer stacking model using semantic understanding.

Heliyon. 2024 Apr 29;10(9):e30117. doi: 10.1016/j.heliyon.2024.e30117. eCollection 2024 May 15.

Development and validation of prediction models for papillary thyroid cancer structural recurrence using machine learning approaches.

BMC Cancer. 2024 Apr 8;24(1):427. doi: 10.1186/s12885-024-12146-4.

Development and validation of predictive models for myopia onset and progression using extensive 15-year refractive data in children and adolescents.

J Transl Med. 2024 Mar 17;22(1):289. doi: 10.1186/s12967-024-05075-0.

The difference in quasi-induced exposure to crashes involving various hazardous driving actions.

PLoS One. 2023 Feb 2;18(2):e0279387. doi: 10.1371/journal.pone.0279387. eCollection 2023.

Data-Driven Estimation of a Driving Safety Tolerance Zone Using Imbalanced Machine Learning.

Sensors (Basel). 2022 Jul 15;22(14):5309. doi: 10.3390/s22145309.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

Suppr超能文献

重采样方法在处理不平衡碰撞数据中的有效性：碰撞类型分析和预测建模。

Effectiveness of resampling methods in coping with imbalanced crash data: Crash type analysis and predictive modeling.

机构信息

出版信息

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译