Department of Computer Science and Technology, Anhui University of Finance and Economics, Bengbu 233030, China.
Sensors (Basel). 2024 Oct 12;24(20):6575. doi: 10.3390/s24206575.
Recently, emotion analysis has played an important role in the field of artificial intelligence, particularly in the study of speech emotion analysis, which can help understand one of the most direct ways of human emotional communication-speech. This study focuses on the emotion analysis of infant crying. Within cries lies a variety of information, including hunger, pain, and discomfort. This paper proposes an improved classification model using ResNet and transformer. It utilizes modified Mel-frequency cepstral coefficient Mel-frequency cepstral coefficient (MFCC) features obtained through feature engineering from infant cries and integrates SE attention mechanism modules into residual blocks to enhance the model's ability to adjust channel weights. The proposed method achieved 93% accuracy rate in experiments, offering advantages of shorter training time and higher accuracy compared to other traditional models. It provides an efficient and stable solution for infant cry classification.
最近,情感分析在人工智能领域发挥了重要作用,特别是在语音情感分析的研究中,这可以帮助理解人类情感交流的最直接方式之一——语音。本研究专注于婴儿哭声的情感分析。哭声中包含着各种信息,包括饥饿、疼痛和不适。本文提出了一种基于 ResNet 和 transformer 的改进分类模型。它利用婴儿哭声通过特征工程得到的改进梅尔频率倒谱系数(MFCC)特征,并将 SE 注意力机制模块集成到残差块中,增强模型调整通道权重的能力。该方法在实验中达到了 93%的准确率,与其他传统模型相比,具有训练时间更短、准确率更高的优点。它为婴儿哭声分类提供了一种高效、稳定的解决方案。