Suppr超能文献

基于面板数据中大量零值的离散分析的重抽样方法在公交车事故中的应用。

A resampling approach to disaggregate analysis of bus-involved crashes using panel data with excessive zeros.

机构信息

Department of Industrial and System Engineering, The Hong Kong Polytechnic University, Hung Hom, Kowloon, Hong Kong; Department of Civil and Environmental Engineering, The Hong Kong Polytechnic University, Hung Hom, Kowloon, Hong Kong.

Department of Computer and Information Science, State Key Laboratory of Internet of Things for Smart City, University of Macau, Taipa, Macao.

出版信息

Accid Anal Prev. 2022 Jan;164:106496. doi: 10.1016/j.aap.2021.106496. Epub 2021 Nov 18.

Abstract

Public bus constitutes more than 70% of the overall road-based public transport patronage in Hong Kong, and its crash involvement rate has been the highest among all public transport modes. Though previous studies had identified explanatory factors that affect the crash risk of buses, use of considerably imbalanced crash data with excessive zero observations could lead to inaccurate parameter estimation. This study aims to resolve the excess zero problem of disaggregate analysis of bus-involved crashes based on synthetic data using a Synthetic Minority Over-Sampling Technique for panel data (SMOTE-P). Dataset comprising crash, traffic, and road inventory data of 88 road segments in Hong Kong during the period from 2014 to 2017 is used. To assess the data balancing performance, other common data generation approaches such as Random Under-sampling of the Majority Class (RUMC) technique, Cluster-Based Under-Sampling (CBUS), and mixed resampling, are also considered. Random effect Poisson (REP) models based on synthetic data and random effect zero-inflated Poisson (REZIP) model based on original data are estimated. Results indicate that REP model based on synthetic data using SMOTE-P outperforms REZIP model based on original data and REP models based on synthetic data using RUMC, CBUS and mixed approaches, in terms of statistical fit, prediction error, and explanatory factors identified. Results of model estimation based on SMOTE-P suggest that factors including morning peak, evening peak, hourly traffic flow, average lane width, road length, bus stop density, percentage of bus in the traffic stream, and presence of bus priority lane all affect the bus-involved crash frequency. More importantly, this study provides a feasible solution for disaggregate crash analysis with imbalanced panel data.

摘要

香港的整体道路公共交通出行中,公共巴士占比超过 70%,其碰撞事故发生率在所有公共交通方式中最高。尽管先前的研究已经确定了影响公共汽车碰撞风险的解释因素,但使用不平衡的碰撞数据,且其中存在大量零观测值,可能会导致参数估计不准确。本研究旨在使用基于合成数据的面板数据合成少数过采样技术(SMOTE-P)来解决公共汽车碰撞事故的非聚合分析中的过零问题。该研究使用了香港 2014 年至 2017 年期间 88 个路段的碰撞、交通和道路清单数据。为了评估数据平衡性能,还考虑了其他常见的数据生成方法,如多数类别的随机欠采样(RUMC)技术、基于聚类的欠采样(CBUS)和混合重采样。基于合成数据的随机效应泊松(REP)模型和基于原始数据的随机效应零膨胀泊松(REZIP)模型都进行了估计。结果表明,基于 SMOTE-P 的合成数据的 REP 模型在统计拟合、预测误差和识别的解释因素方面均优于基于原始数据的 REZIP 模型和基于 RUMC、CBUS 和混合方法的合成数据的 REP 模型。基于 SMOTE-P 的模型估计结果表明,包括早高峰、晚高峰、每小时交通流量、平均车道宽度、道路长度、公共汽车站密度、交通流中公共汽车的比例以及公共汽车优先车道的存在等因素都影响公共汽车碰撞事故的发生频率。更重要的是,本研究为不平衡面板数据的非聚合碰撞分析提供了一种可行的解决方案。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验