Northwestern Polytechnical University, Xi'an, 710072, Shaanxi, China.
Comput Biol Med. 2022 Oct;149:105907. doi: 10.1016/j.compbiomed.2022.105907. Epub 2022 Jul 22.
Automatic Emotion Recognition (AER) is critical for naturalistic Human-Machine Interactions (HMI). Emotions can be detected through both external behaviors, e.g., tone of voice and internal physiological signals, e.g., electroencephalogram (EEG). In this paper, we first constructed a multi-modal emotion database, named Multi-modal Emotion Database with four modalities (MED4). MED4 consists of synchronously recorded signals of participants' EEG, photoplethysmography, speech and facial images when they were influenced by video stimuli designed to induce happy, sad, angry and neutral emotions. The experiment was performed with 32 participants in two environment conditions, a research lab with natural noises and an anechoic chamber. Four baseline algorithms were developed to verify the database and the performances of AER methods, Identification-vector + Probabilistic Linear Discriminant Analysis (I-vector + PLDA), Temporal Convolutional Network (TCN), Extreme Learning Machine (ELM) and Multi-Layer Perception Network (MLP). Furthermore, two fusion strategies on feature-level and decision-level respectively were designed to utilize both external and internal information of human status. The results showed that EEG signals generate higher accuracy in emotion recognition than that of speech signals (achieving 88.92% in anechoic room and 89.70% in natural noisy room vs 64.67% and 58.92% respectively). Fusion strategies that combine speech and EEG signals can improve overall accuracy of emotion recognition by 25.92% when compared to speech and 1.67% when compared to EEG in anechoic room and 31.74% and 0.96% in natural noisy room. Fusion methods also enhance the robustness of AER in the noisy environment. The MED4 database will be made publicly available, in order to encourage researchers all over the world to develop and validate various advanced methods for AER.
自动情感识别(AER)对于自然人机交互(HMI)至关重要。情感可以通过外部行为(例如语调)和内部生理信号(例如脑电图(EEG))来检测。在本文中,我们首先构建了一个多模态情感数据库,命名为多模态情感数据库与四种模式(MED4)。MED4由参与者的 EEG、光体积描记、语音和面部图像的同步记录信号组成,当他们受到设计为诱导快乐、悲伤、愤怒和中性情绪的视频刺激影响时。实验在两个环境条件下进行,一个是有自然噪音的研究实验室和一个消声室。开发了四个基线算法来验证数据库和 AER 方法的性能,即识别向量+概率线性判别分析(I-vector+PLDA)、时间卷积网络(TCN)、极限学习机(ELM)和多层感知机(MLP)。此外,还分别在特征级和决策级设计了两种融合策略,以利用人类状态的外部和内部信息。结果表明,与语音信号相比,脑电图信号在情感识别中的准确率更高(在消声室中达到 88.92%,在自然嘈杂室中达到 89.70%,而分别为 64.67%和 58.92%)。与语音相比,融合策略可将语音和 EEG 信号相结合,可将情感识别的整体准确率提高 25.92%,在消声室中提高 0.96%,在自然嘈杂室中提高 31.74%和 1.67%。融合方法还增强了 AER 在嘈杂环境中的鲁棒性。MED4 数据库将公开提供,以鼓励世界各地的研究人员开发和验证各种用于 AER 的先进方法。