Suppr
超能文献

一种基于MFCC特征选择和SHAP分析的性别敏感语音情感识别深度学习框架。

A deep learning framework for gender sensitive speech emotion recognition based on MFCC feature selection and SHAP analysis.

作者信息

Hu Qingqing, Peng Yiran, Zheng Zhong

机构信息

Faculty of Humanities and Arts, Macau University of Science and Technology, Avenida Wai Long, Taipa, Macau, 999078, China.

Faculty of Innovation Engineering, Macau University of Science and Technology, Avenida Wai Long, Taipa, Macau, 999078, China.

出版信息

Sci Rep. 2025 Aug 5;15(1):28569. doi: 10.1038/s41598-025-14016-w.

DOI:10.1038/s41598-025-14016-w

PMID:40764384

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12325928/

Abstract

Speech is one of the most efficient methods of communication among humans, inspiring advancements in machine speech processing under Natural Language Processing (NLP). This field aims to enable computers to analyze, comprehend, and generate human language naturally. Speech processing, as a subset of artificial intelligence, is rapidly expanding due to its applications in emotion recognition, human-computer interaction, and sentiment analysis. This study introduces a novel algorithm for emotion recognition from speech using deep learning techniques. The proposed model achieves up to a 15% improvement compared to state-of-the-art deep learning methods in speech emotion recognition. It employs advanced supervised learning algorithms and deep neural network architectures, including Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs) with Long Short-Term Memory (LSTM) units. These models are trained on labeled datasets to accurately classify emotions such as happiness, sadness, anger, fear, surprise, and neutrality. The research highlights the system's real-time application potential, such as analyzing audience emotional responses during live television broadcasts. By leveraging advancements in deep learning, the model achieves high accuracy in understanding and predicting emotional states, offering valuable insights into user behavior. This approach contributes to diverse domains, including media analysis, customer feedback systems, and human-machine interaction, showcasing the transformative potential of combining speech processing with neural networks.

摘要

语音是人类最有效的交流方式之一，推动了自然语言处理（NLP）领域中机器语音处理技术的进步。该领域旨在使计算机能够自然地分析、理解和生成人类语言。语音处理作为人工智能的一个子集，因其在情感识别、人机交互和情感分析中的应用而迅速发展。本研究介绍了一种使用深度学习技术从语音中进行情感识别的新算法。与语音情感识别领域的现有深度学习方法相比，所提出的模型在准确率上提高了15%。它采用了先进的监督学习算法和深度神经网络架构，包括卷积神经网络（CNN）和带有长短期记忆（LSTM）单元的循环神经网络（RNN）。这些模型在有标签的数据集上进行训练，以准确分类诸如快乐、悲伤、愤怒、恐惧、惊讶和中性等情感。该研究突出了该系统的实时应用潜力，例如分析直播电视节目期间观众的情感反应。通过利用深度学习的进展，该模型在理解和预测情感状态方面取得了高精度，为用户行为提供了有价值的见解。这种方法对包括媒体分析、客户反馈系统和人机交互在内的多个领域都有贡献，展示了将语音处理与神经网络相结合的变革潜力。