McLaurin Elease J, Lee John D, McDonald Anthony D, Aksan Nazan, Dawson Jeffrey, Tippin Jon, Rizzo Matthew
University of Wisconsin-Madison, 1415 Engineering Drive, Madison, WI 53706.
University of Iowa Hospitals and Clinics, 200 Hawkins Drive, Iowa City, IA 52242.
Transp Res Part F Traffic Psychol Behav. 2018 Oct;58:25-38. doi: 10.1016/j.trf.2018.05.019. Epub 2018 Jun 9.
One challenge in using naturalistic driving data is producing a holistic analysis of these highly variable datasets. Typical analyses focus on isolated events, such as large g-force accelerations indicating a possible near-crash. Examining isolated events is ill-suited for identifying patterns in continuous activities such as maintaining vehicle control. We present an alternative approach that converts driving data into a text representation and uses topic modeling to identify patterns across the dataset. This approach enables the discovery of non-linear patterns, reduces the dimensionality of the data, and captures subtle variations in driver behavior. In this study topic models are used to concisely described patterns in trips from drivers with and without untreated obstructive sleep apnea (OSA). The analysis included 5000 trips (50 trips from 100 drivers; 66 drivers with OSA; 34 comparison drivers). Trips were treated as documents, and speed and acceleration data from the trips were converted to "driving words." The identified patterns, called topics, were determined based on regularities in the co-occurrence of the driving words within the trips. This representation was used in random forest models to predict the driver condition (i.e., OSA or comparison) for each trip. Models with 10, 15 and 20 topics had better accuracy in predicting the driver condition, with a maximum AUC of 0.73 for a model with 20 topics. Trips from drivers with OSA were more likely to be defined by topics for smaller lateral accelerations at low speeds. The results demonstrate topic modeling as a useful tool for extracting meaningful information from naturalistic driving datasets.
使用自然驾驶数据面临的一个挑战是对这些高度可变的数据集进行全面分析。典型的分析集中在孤立的事件上,比如表明可能接近碰撞的大重力加速度。检查孤立事件并不适合识别诸如保持车辆控制等连续活动中的模式。我们提出了一种替代方法,将驾驶数据转换为文本表示,并使用主题建模来识别数据集中的模式。这种方法能够发现非线性模式,降低数据维度,并捕捉驾驶员行为中的细微变化。在本研究中,主题模型用于简洁地描述患有和未患有未经治疗的阻塞性睡眠呼吸暂停(OSA)的驾驶员行程中的模式。分析包括5000次行程(来自100名驾驶员的50次行程;66名患有OSA的驾驶员;34名对照驾驶员)。行程被视为文档,行程中的速度和加速度数据被转换为“驾驶词汇”。所识别的模式,即主题,是根据行程中驾驶词汇共现的规律确定 的。这种表示用于随机森林模型中,以预测每次行程的驾驶员状况(即OSA或对照)。具有主题数为10、15和20的模型在预测驾驶员状况方面具有更高的准确性,主题数为20的模型的最大曲线下面积(AUC)为0.73。患有OSA的驾驶员的行程更有可能由低速时较小横向加速度的主题来定义。结果表明主题建模是从自然驾驶数据集中提取有意义信息的有用工具。