Suppr超能文献

从非完整数据集对山区高速公路交通事故的贡献者进行排名:基于链式方程多元插补和随机森林分类器的顺序方法。

Ranking contributors to traffic crashes on mountainous freeways from an incomplete dataset: A sequential approach of multivariate imputation by chained equations and random forest classifier.

机构信息

College of Civil and Transportation Engineering, Shenzhen University, Shenzhen, Guangdong, 518060 People's Republic of China.

School of Civil Engineering, The University of Queensland, St. Lucia 4072, Brisbane, Australia.

出版信息

Accid Anal Prev. 2020 Oct;146:105744. doi: 10.1016/j.aap.2020.105744. Epub 2020 Aug 27.

Abstract

The estimation of the effect of contributors to crash injury severity and the prediction of crash injury severity outcomes suffer often from biases related to missing data in crash datasets that contain incomplete records. As both estimation and prediction would greatly improve if the missing values were recovered, this study proposes a sequential approach to handle incomplete crash datasets and rank contributors to the injury severity of crashes on mountainous freeways in China. The sequential approach consists of two parts: (i) multivariate imputation by chained equations imputes the missing values of independent variables; (ii) a random forest classifier analyses the correlation between the dependent and the independent variables. The first part considers different imputation methods in light of the independent variables being either binary, categorical or continuous, whereas the second part classifies the correlations according to the random forest classifier. The proposed method was applied to the case-study about mountainous freeways in China and compared to the analysis of the raw dataset to evaluate its effectiveness, and the results illustrate that the method improves significantly the classification accuracy when compared with existing methods. Moreover, the classifier ranked the contributors to the injury severity of traffic crashes on mountainous freeways: in order of importance vehicle type, crash type, road longitudinal gradient, crash cause, curve radius, and deflection angles. Interestingly, a lower importance was found for environmental factors.

摘要

在含有不完整记录的事故数据集中,由于数据缺失,事故伤害严重程度的影响因素的评估和事故伤害严重程度结果的预测常常存在偏差。如果能够恢复缺失值,那么这两项工作的准确性都会大大提高。因此,本研究提出了一种顺序方法来处理不完整的事故数据集,并对中国山区高速公路事故伤害严重程度的影响因素进行排序。该顺序方法由两部分组成:(i)通过链式方程的多元插补方法对自变量的缺失值进行插补;(ii)随机森林分类器分析因变量与自变量之间的相关性。第一部分根据自变量是二进制、分类或连续变量的不同,考虑了不同的插补方法;第二部分根据随机森林分类器对相关性进行分类。该方法应用于中国山区高速公路的案例研究,并与原始数据集的分析进行了比较,以评估其有效性,结果表明,与现有方法相比,该方法显著提高了分类准确性。此外,该分类器对山区高速公路交通事故伤害严重程度的影响因素进行了排序:按重要性顺序依次为车型、事故类型、道路纵坡、事故原因、曲线半径和偏转角。有趣的是,环境因素的重要性较低。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验