Department of Civil and Environmental Engineering, University of Wisconsin-Madison, Madison, WI 53706, United States of America.
The Key Laboratory of Road and Traffic Engineering, Ministry of Education Tongji University, Shanghai, China.
Accid Anal Prev. 2024 Aug;203:107605. doi: 10.1016/j.aap.2024.107605. Epub 2024 May 13.
Safety is one of the most essential considerations when evaluating the performance of autonomous vehicles (AVs). Real-world AV data, including trajectory, detection, and crash data, are becoming increasingly popular as they provide possibilities for a realistic evaluation of AVs' performance. While substantial research was conducted to estimate general crash patterns utilizing structured AV crash data, a comprehensive exploration of AV crash narratives remains limited. These narratives contain latent information about AV crashes that can further the understanding of AV safety. Therefore, this study utilizes the Structural Topic Model (STM), a natural language processing technique, to extract latent topics from unstructured AV crash narratives while incorporating crash metadata (i.e., the severity and year of crashes). In total, 15 topics are identified and are further divided into behavior-related, party-related, location-related, and general topics. Using these topics, AV crashes can be systematically described and clustered. Results from the STM suggest that AVs' abilities to interact with vulnerable road users (VRUs) and react to lane-change behavior need to be further improved. Moreover, an XGBoost model is developed to investigate the relationships between the topics and crash severity. The model significantly outperforms existing studies in terms of accuracy, suggesting that the extracted topics are closely related to crash severity. Results from interpreting the model indicate that topics containing information about crash severity and VRUs have significant impacts on the model's output, which are suggested to be included in future AV crash reporting.
安全是评估自动驾驶汽车(AV)性能时最需要考虑的因素之一。包括轨迹、检测和碰撞数据在内的真实世界 AV 数据,由于其为真实评估 AV 性能提供了可能性,因此越来越受欢迎。虽然已经进行了大量研究,利用结构化 AV 碰撞数据来估计一般的碰撞模式,但对 AV 碰撞叙述的全面探索仍然有限。这些叙述包含有关 AV 碰撞的潜在信息,可以进一步加深对 AV 安全的理解。因此,本研究利用结构主题模型(STM),一种自然语言处理技术,从非结构化的 AV 碰撞叙述中提取潜在主题,同时结合碰撞元数据(即碰撞的严重程度和年份)。总共确定了 15 个主题,并进一步分为行为相关、参与方相关、地点相关和一般主题。使用这些主题,可以系统地描述和聚类 AV 碰撞。STM 的结果表明,AV 与弱势道路使用者(VRU)互动以及对变道行为做出反应的能力需要进一步提高。此外,还开发了一个 XGBoost 模型来研究主题与碰撞严重程度之间的关系。该模型在准确性方面明显优于现有研究,表明提取的主题与碰撞严重程度密切相关。通过对模型进行解释的结果表明,包含有关碰撞严重程度和 VRU 信息的主题对模型的输出有重大影响,建议将这些主题纳入未来的 AV 碰撞报告中。