大语言模型多模态交通事故预测

LLM Multimodal Traffic Accident Forecasting.

作者信息

de Zarzà I, de Curtò J, Roig Gemma, Calafate Carlos T

机构信息

Informatik und Mathematik, GOETHE-University Frankfurt am Main, 60323 Frankfurt am Main, Germany.

Departamento de Informática de Sistemas y Computadores, Universitat Politècnica de València, 46022 València, Spain.

出版信息

Sensors (Basel). 2023 Nov 16;23(22):9225. doi: 10.3390/s23229225.

DOI:10.3390/s23229225

PMID:38005612

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10674612/

Abstract

With the rise in traffic congestion in urban centers, predicting accidents has become paramount for city planning and public safety. This work comprehensively studied the efficacy of modern deep learning (DL) methods in forecasting traffic accidents and enhancing Level-4 and Level-5 (L-4 and L-5) driving assistants with actionable visual and language cues. Using a rich dataset detailing accident occurrences, we juxtaposed the Transformer model against traditional time series models like ARIMA and the more recent Prophet model. Additionally, through detailed analysis, we delved deep into feature importance using principal component analysis (PCA) loadings, uncovering key factors contributing to accidents. We introduce the idea of using real-time interventions with large language models (LLMs) in autonomous driving with the use of lightweight compact LLMs like LLaMA-2 and Zephyr-7b-α. Our exploration extends to the realm of multimodality, through the use of Large Language-and-Vision Assistant (LLaVA)-a bridge between visual and linguistic cues by means of a Visual Language Model (VLM)-in conjunction with deep probabilistic reasoning, enhancing the real-time responsiveness of autonomous driving systems. In this study, we elucidate the advantages of employing large multimodal models within DL and deep probabilistic programming for enhancing the performance and usability of time series forecasting and feature weight importance, particularly in a self-driving scenario. This work paves the way for safer, smarter cities, underpinned by data-driven decision making.

摘要

随着城市中心交通拥堵情况的加剧，预测交通事故对于城市规划和公共安全变得至关重要。这项工作全面研究了现代深度学习（DL）方法在预测交通事故以及通过可操作的视觉和语言线索增强四级和五级（L-4和L-5）驾驶辅助系统方面的效果。我们使用了一个详细记录事故发生情况的丰富数据集，将Transformer模型与传统时间序列模型（如ARIMA）以及更新的Prophet模型进行了对比。此外，通过详细分析，我们利用主成分分析（PCA）载荷深入研究了特征重要性，揭示了导致事故的关键因素。我们引入了在自动驾驶中使用大型语言模型（LLM）进行实时干预的想法，采用了像LLaMA-2和Zephyr-7b-α这样的轻量级紧凑LLM。我们的探索扩展到了多模态领域，通过使用大型语言与视觉助手（LLaVA）——一种通过视觉语言模型（VLM）在视觉和语言线索之间搭建桥梁的工具——结合深度概率推理，提高自动驾驶系统的实时响应能力。在本研究中，我们阐明了在深度学习和深度概率编程中使用大型多模态模型对于提高时间序列预测性能和可用性以及特征权重重要性的优势，特别是在自动驾驶场景中。这项工作为以数据驱动决策为支撑的更安全、更智能的城市铺平了道路。