School of Mechanical and Electrical Engineering, Chengdu University of Technology, Chengdu 610059, China.
China Unicom Digital Technology Co., Ltd. Hubei Branch, Wuhan 430015, China.
Sensors (Basel). 2023 Oct 7;23(19):8293. doi: 10.3390/s23198293.
Negative emotions of drivers may lead to some dangerous driving behaviors, which in turn lead to serious traffic accidents. However, most of the current studies on driver emotions use a single modality, such as EEG, eye trackers, and driving data. In complex situations, a single modality may not be able to fully consider a driver's complete emotional characteristics and provides poor robustness. In recent years, some studies have used multimodal thinking to monitor single emotions such as driver fatigue and anger, but in actual driving environments, negative emotions such as sadness, anger, fear, and fatigue all have a significant impact on driving safety. However, there are very few research cases using multimodal data to accurately predict drivers' comprehensive emotions. Therefore, based on the multi-modal idea, this paper aims to improve drivers' comprehensive emotion recognition. By combining the three modalities of a driver's voice, facial image, and video sequence, the six classification tasks of drivers' emotions are performed as follows: sadness, anger, fear, fatigue, happiness, and emotional neutrality. In order to accurately identify drivers' negative emotions to improve driving safety, this paper proposes a multi-modal fusion framework based on the CNN + Bi-LSTM + HAM to identify driver emotions. The framework fuses feature vectors of driver audio, facial expressions, and video sequences for comprehensive driver emotion recognition. Experiments have proved the effectiveness of the multi-modal data proposed in this paper for driver emotion recognition, and its recognition accuracy has reached 85.52%. At the same time, the validity of this method is verified by comparing experiments and evaluation indicators such as accuracy and F1 score.
驾驶员的负面情绪可能导致一些危险的驾驶行为,进而导致严重的交通事故。然而,目前大多数关于驾驶员情绪的研究都使用单一模态,如 EEG、眼动追踪器和驾驶数据。在复杂的情况下,单一模态可能无法充分考虑驾驶员的完整情绪特征,并且提供的鲁棒性较差。近年来,一些研究使用多模态思维来监测驾驶员疲劳和愤怒等单一情绪,但在实际驾驶环境中,悲伤、愤怒、恐惧和疲劳等负面情绪都会对驾驶安全产生重大影响。然而,使用多模态数据准确预测驾驶员综合情绪的研究案例非常少。因此,基于多模态思想,本文旨在提高驾驶员的综合情绪识别能力。通过结合驾驶员的声音、面部图像和视频序列这三种模态,执行驾驶员情绪的六个分类任务:悲伤、愤怒、恐惧、疲劳、高兴和情绪中性。为了准确识别驾驶员的负面情绪,以提高驾驶安全性,本文提出了一种基于 CNN + Bi-LSTM + HAM 的多模态融合框架,用于识别驾驶员的情绪。该框架融合了驾驶员音频、面部表情和视频序列的特征向量,以进行全面的驾驶员情绪识别。实验证明了本文提出的多模态数据在驾驶员情绪识别中的有效性,其识别准确率达到了 85.52%。同时,通过对比实验和准确性、F1 分数等评价指标,验证了该方法的有效性。