Suppr超能文献

用于增强碰撞预测的异构集成学习——一种基于频率论和机器学习的堆叠框架。

Heterogeneous ensemble learning for enhanced crash forecasts - A frequentist and machine learning based stacking framework.

作者信息

Ahmad Numan, Wali Behram, Khattak Asad J

机构信息

Department of Civil & Environmental Engineering, The University of Tennessee, Knoxville, TN 37996, USA.

Urban Design 4 Health, Inc., 24 Jackie Circle, East Rochester, NY 14612, USA.

出版信息

J Safety Res. 2023 Feb;84:418-434. doi: 10.1016/j.jsr.2022.12.005. Epub 2022 Dec 14.

Abstract

INTRODUCTION

This study aims to increase the prediction accuracy of crash frequency on roadway segments that can forecast future safety on roadway facilities. A variety of statistical and machine learning (ML) methods are used to model crash frequency with ML methods generally having a higher prediction accuracy. Recently, heterogeneous ensemble methods (HEM), including "stacking," have emerged as more accurate and robust intelligent techniques providing more reliable and accurate predictions.

METHODS

This study applies "Stacking" to model crash frequency on five-lane undivided (5 T) segments of urban and suburban arterials. The prediction performance of "Stacking" is compared with parametric statistical models (Poisson and negative binomial) and three state-of-the-art ML techniques (Decision tree, random forest, and gradient boosting), each of which is termed as the base-learner. By employing an optimal weight scheme to combine individual base-learners through stacking, the problem of biased predictions in individual base-learners due to differences in specifications and prediction accuracies is avoided. Data including crash, traffic, and roadway inventory were collected and integrated from 2013 to 2017. The data are split into training (2013-2015), validation (2016), and testing (2017) datasets. After training five individual base-learners using training data, prediction outcomes are obtained for the five base-learners using validation data that are then used to train a meta-learner.

RESULTS

Results of statistical models reveal that crashes increase with the density (number per mile) of commercial driveways whereas decrease with average offset distance to fixed objects. Individual ML methods show similar results - in terms of variable importance. A comparison of out-of-sample predictions of various models or methods confirms the superiority of "Stacking" over the alternative methods considered.

CONCLUSIONS AND PRACTICAL APPLICATIONS

From a practical standpoint, "stacking" can enhance prediction accuracy (compared to only one base-learner with a particular specification). When applied systemically, stacking can help identify more appropriate countermeasures.

摘要

引言

本研究旨在提高道路路段碰撞频率的预测准确性,从而能够预测道路设施未来的安全性。研究使用了多种统计和机器学习(ML)方法对碰撞频率进行建模,其中ML方法通常具有更高的预测准确性。最近,包括“堆叠”在内的异构集成方法(HEM)已成为更准确、更强大的智能技术,能够提供更可靠、更准确的预测。

方法

本研究应用“堆叠”对城市和郊区干道的五车道无分隔(5T)路段的碰撞频率进行建模。将“堆叠”的预测性能与参数统计模型(泊松和负二项式)以及三种先进的ML技术(决策树、随机森林和梯度提升)进行比较,每种技术都被称为基学习器。通过采用最优权重方案,通过堆叠组合各个基学习器,避免了由于规格和预测准确性差异导致的单个基学习器中预测偏差的问题。收集并整合了2013年至2017年的碰撞、交通和道路库存数据。数据被分为训练集(2013 - 2015年)、验证集(2016年)和测试集(2017年)。使用训练数据训练五个单独的基学习器后,使用验证数据获得五个基学习器的预测结果,然后将这些结果用于训练一个元学习器。

结果

统计模型的结果表明,碰撞次数随着商业车道密度(每英里数量)的增加而增加,而随着与固定物体的平均偏移距离的增加而减少。各个ML方法在变量重要性方面显示出类似的结果。各种模型或方法的样本外预测比较证实了“堆叠”相对于所考虑的替代方法的优越性。

结论与实际应用

从实际角度来看,“堆叠”可以提高预测准确性(与仅具有特定规格的一个基学习器相比)。当系统应用时,堆叠有助于确定更合适的对策。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验