Jiangsu Key Laboratory of Urban ITS, Southeast University, Si Pai Lou #2, Nanjing, 210096, China; Jiangsu Province Collaborative Innovation Center of Modern Urban Traffic Technologies, Si Pai Lou #2, Nanjing, 210096, China.
Department of Civil and Environmental Engineering, University of Wisconsin-Milwaukee, NWQ4414, P.O. Box 784, Milwaukee, WI 53201, United States.
Accid Anal Prev. 2018 Nov;120:281-294. doi: 10.1016/j.aap.2018.08.014. Epub 2018 Sep 1.
The primary objective of this study was to investigate how trip pattern variables extracted from large-scale taxi GPS data contribute to the spatially aggregated crashes in urban areas. The following five types of data were collected: crash data, large-scale taxi GPS data, road network attributes, land use features and social-demographic data. A data-driven modeling approach based on Latent Dirichlet Allocation (LDA) was proposed for discovering hidden trip patterns from a taxi GPS dataset, and a total of fifty trip patterns were identified. The collected data and the identified trip patterns were further aggregated into167 ZIP Code Tabulation Areas (ZCTA). Random forest technique was used to identify the factors that contributed to total, PDO and fatal-plus-injury crashes in the selected ZCTAs during the study period. Geographically weighted Poisson regression (GWPR) models were then developed to establish a relationship between the crashes and the contributing factors selected by the random forest technique. Comparative analyses were conducted to compare the performance of the GWPR models that considered traditional traffic exposure variables only, trip pattern variables only, and both traditional exposure and trip pattern variables. The model specification results suggest that the trip pattern variables significantly affected the crash counts in the selected ZCTAs, and the models that considered both the traditional traffic exposure and the trip pattern variables had the best goodness-of-fit in terms of the lowest MAD and AICc values.
本研究的主要目的是探讨从大规模出租车 GPS 数据中提取的出行模式变量如何有助于城市地区空间聚集的碰撞。收集了以下五类数据:碰撞数据、大规模出租车 GPS 数据、道路网络属性、土地利用特征和社会人口数据。提出了一种基于潜在狄利克雷分配(LDA)的数据驱动建模方法,用于从出租车 GPS 数据集发现隐藏的出行模式,共确定了五十种出行模式。收集的数据和确定的出行模式进一步汇总到 167 个邮政编码区(ZCTA)。随机森林技术用于识别研究期间选定 ZCTA 中总、PDO 和致命加伤害碰撞的因素。然后开发了地理加权泊松回归(GWPR)模型,以建立碰撞与随机森林技术选择的贡献因素之间的关系。进行了比较分析,以比较仅考虑传统交通暴露变量、仅考虑出行模式变量以及同时考虑传统暴露和出行模式变量的 GWPR 模型的性能。模型规范结果表明,出行模式变量显著影响选定 ZCTA 中的碰撞次数,并且考虑传统交通暴露和出行模式变量的模型在最低 MAD 和 AICc 值方面具有最佳的拟合优度。