Dept. of Civil and Construction Engineering, Western Michigan Univ., 4601 Campus Dr., G-238, Kalamazoo, MI, 49008-5316, United States.
Dept. of Statistics, Western Michigan Univ., 1903 W Michigan Ave, Kalamazoo, MI, 49008-5152, United States.
Accid Anal Prev. 2021 Feb;150:105899. doi: 10.1016/j.aap.2020.105899. Epub 2020 Dec 4.
The proliferation of digital textual archives in the transportation safety domain makes it imperative for the inventions of efficient ways of extracting information from the textual data sources. The present study aims at utilizing crash narratives complemented by crash metadata to discern the prevalence and co-occurrence of themes that contribute to crash incidents. Ten years (2009-2018) of Michigan traffic fatal crash narratives were used as a case study. The structural topic modeling (STM) and network topology analysis were used to generate and examine the prevalence and interaction of themes from the crash narratives that were mainly categorized into pre-crash events, crash locations and involved parties in the traffic crashes. The main advantage of the STM over the other topic modeling approaches is that it allows the researchers to discover themes from documents and estimate how the topic relates to the document metadata. Topics with the highest prevalence for the angle, head-on, rear-end, sideswipe and single motor vehicle crashes were crash at stop-sign, crossing the centerline, unable to stop, lane change maneuver and run-off-road crash, respectively. Eigenvector centrality measure in network topology showed that event-related topics were consistently central in articulating the crash occurrence. The centrality and association between topics varied across crash types. The efficacy of generated topics in classifying crashes by type was tested using a machine learning algorithm, Random Forest. The classification accuracy in the held-out sample ranged between 89.3 % for sideswipe crashes to 99.2 % for single motor vehicle crashes. High classification accuracy suggests that automation of crash typing and consistency checks can be accomplished effectively by using extracted latent themes from the crash narratives.
交通安全领域数字文本档案的大量涌现,使得从文本数据源中高效提取信息的发明变得势在必行。本研究旨在利用事故叙述,并辅以事故元数据,辨别导致事故的主题的普遍性和共同出现。选取密歇根州十年(2009-2018 年)的交通致命事故叙述作为案例研究。采用结构主题模型(STM)和网络拓扑分析,从事故叙述中生成和检验主题的普遍性和相互作用,这些主题主要分为事故前事件、事故地点和事故涉及方。与其他主题建模方法相比,STM 的主要优势在于,它允许研究人员从文档中发现主题,并估计主题与文档元数据的关系。角度碰撞、正面碰撞、追尾碰撞、侧面碰撞和单辆机动车碰撞的最高出现率主题分别是停车标志处碰撞、越过中心线、无法停车、变道机动和驶离道路碰撞。网络拓扑中的特征向量中心性度量表明,与事件相关的主题在阐述事故发生时始终处于中心位置。主题的中心性和关联在不同的碰撞类型之间有所不同。使用机器学习算法随机森林(Random Forest)对生成的主题进行分类的效果进行了测试。在保留样本中的分类准确率范围从侧面碰撞的 89.3%到单辆机动车碰撞的 99.2%。高分类准确率表明,可以通过从事故叙述中提取潜在主题来有效地实现碰撞类型的自动化和一致性检查。