Alhussein Ghada, Alkhodari Mohanad, Ziogas Ioannis, Lamprou Charalampos, Khandoker Ahsan H, Hadjileontiadis Leontios J
Department of Biomedical Engineering and Biotechnology, Khalifa University of Science and Technology, Abu Dhabi, United Arab Emirates.
Department of Biomedical Engineering and Biotechnology, Khalifa University of Science and Technology, Abu Dhabi, United Arab Emirates; Cardiovascular Clinical Research Facility, Radcliffe Department of Medicine, University of Oxford, Oxford, United Kingdom.
Comput Methods Programs Biomed. 2025 Jun;265:108695. doi: 10.1016/j.cmpb.2025.108695. Epub 2025 Mar 18.
Emotion recognition in conversations using artificial intelligence (AI) has gained significant attention due to its potential to provide insights into human social behavior. This study extends AI-based emotion recognition to the recognition of emotional climate (EC), which reflects the joint emotional atmosphere dynamically created and perceived by peers during conversations. The objective is to propose and evaluate a novel approach, MLBispec, for EC recognition using speech signals.
The MLBispec approach involves time-windowed bispectral analysis of conversational speech signals to extract features related to nonlinear harmonic interactions. These features are combined with peers' affect dynamics, derived from emotion labeling for the same time windows, to form an extended feature set. The combined feature set is then fed into machine learning (ML) classifiers. MLBispec was evaluated on the IEMOCAP, K-EmoCon, and SEWA open-access datasets, which provide 2D emotion annotations (arousal and valence) divided into low/high classes. Additionally, cross-lingual experiments were conducted to test the framework's generalization across languages.
Experimental results demonstrated that MLBispec outperformed previous deep learning-based state-of-the-art approaches in speech emotion recognition, achieving accuracies of 82.6% for arousal and 75.4% for valence. The framework's incorporation of both qualitative and quantitative EC measurements enhanced its ability to characterize the dynamic speech representations of conversational affective structures. Cross-lingual experiments further validated the robustness of MLBispec.
The findings highlight the effectiveness of MLBispec in objectively recognizing peers' EC during conversations, setting a new standard for practical emotionally-aware applications. These include point-of-care healthcare, human-computer interfaces (HCI), and large-language models (LLMs). By enabling dynamic and reliable EC recognition, MLBispec paves the way for advancements in emotionally intelligent systems.
利用人工智能(AI)进行对话中的情绪识别因其对洞察人类社会行为的潜力而备受关注。本研究将基于AI的情绪识别扩展至情绪氛围(EC)识别,情绪氛围反映了对话过程中同伴动态创造和感知的共同情绪氛围。目的是提出并评估一种用于基于语音信号进行EC识别的新方法MLBispec。
MLBispec方法涉及对对话语音信号进行时间窗口双谱分析,以提取与非线性谐波相互作用相关的特征。这些特征与从相同时间窗口的情绪标注中得出的同伴情感动态相结合,形成一个扩展特征集。然后将组合后的特征集输入机器学习(ML)分类器。在IEMOCAP、K-EmoCon和SEWA开放获取数据集上对MLBispec进行评估,这些数据集提供分为低/高类别 的二维情绪标注(唤醒度和效价)。此外,还进行了跨语言实验以测试该框架在不同语言间的通用性。
实验结果表明,MLBispec在语音情绪识别方面优于以往基于深度学习的最先进方法,唤醒度准确率达到82.6%,效价准确率达到75.4%。该框架对定性和定量EC测量的结合增强了其表征对话情感结构动态语音表征的能力。跨语言实验进一步验证了MLBispec的稳健性。
研究结果突出了MLBispec在客观识别对话中同伴EC方面的有效性,为实际的情感感知应用设定了新标准。这些应用包括即时医疗保健、人机界面(HCI)和大语言模型(LLM)。通过实现动态且可靠的EC识别,MLBispec为情感智能系统的进步铺平了道路。