• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

用于广义语音情感识别的向量学习表示。

Vector learning representation for generalized speech emotion recognition.

作者信息

Singkul Sattaya, Woraratpanya Kuntpong

机构信息

Faculty of Information Technology, King Mongkut's Institute of Technology Ladkrabang, 1 Chalong Krung, Lat Krabang, 10520, Bangkok, Thailand.

出版信息

Heliyon. 2022 Mar 28;8(3):e09196. doi: 10.1016/j.heliyon.2022.e09196. eCollection 2022 Mar.

DOI:10.1016/j.heliyon.2022.e09196
PMID:35846479
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9280549/
Abstract

Speech emotion recognition (SER) plays an important role in global business today to improve service efficiency. In the literature of SER, many techniques have been using deep learning to extract and learn features. Recently, we have proposed end-to-end learning for a deep residual local feature learning block (DeepResLFLB). The advantages of end-to-end learning are low engineering effort and less hyperparameter tuning. Nevertheless, this learning method is easily to fall into an overfitting problem. Therefore, this paper described the concept of the "verify-to-classify" framework to apply for learning vectors, extracted from feature spaces of emotional information. This framework consists of two important portions: speech emotion learning and recognition. In speech emotion learning, consisting of two steps: speech emotion verification enrolled training and prediction, the residual learning (ResNet) with squeeze-excitation (SE) block was used as a core component of both steps to extract emotional state vectors and build an emotion model by the speech emotion verification enrolled training. Then the in-domain pre-trained weights of the emotion trained model are transferred to the prediction step. As a result of the speech emotion learning, the accepted model-validated by EER-is transferred to the speech emotion recognition in terms of out-domain pre-trained weights, which are ready for classification using a classical ML method. In this manner, a suitable loss function is important to work with emotional vectors. Here, two loss functions were proposed: angular prototypical and softmax with angular prototypical losses. Based on two publicly available datasets: Emo-DB and RAVDESS, both with high- and low-quality environments. The experimental results show that our proposed method can significantly improve generalized performance and explainable emotion results, when evaluated by standard metrics: EER, accuracy, precision, recall, and F1-score.

摘要

语音情感识别(SER)在当今全球商业中对于提高服务效率起着重要作用。在SER的文献中,许多技术一直在使用深度学习来提取和学习特征。最近,我们提出了一种用于深度残差局部特征学习块(DeepResLFLB)的端到端学习方法。端到端学习的优点是工程工作量低且超参数调整较少。然而,这种学习方法很容易陷入过拟合问题。因此,本文描述了“验证到分类”框架的概念,以应用于从情感信息特征空间中提取的学习向量。该框架由两个重要部分组成:语音情感学习和识别。在语音情感学习中,包括两个步骤:语音情感验证注册训练和预测,带有挤压激励(SE)块的残差学习(ResNet)被用作这两个步骤的核心组件,以提取情感状态向量并通过语音情感验证注册训练构建情感模型。然后将情感训练模型的域内预训练权重转移到预测步骤。作为语音情感学习的结果,通过EER验证的接受模型根据域外预训练权重被转移到语音情感识别,这些权重已准备好使用经典机器学习方法进行分类。以这种方式,合适的损失函数对于处理情感向量很重要。这里提出了两种损失函数:角度原型损失和带有角度原型损失的softmax损失。基于两个公开可用的数据集:Emo-DB和RAVDESS,它们都具有高质量和低质量环境。实验结果表明,当通过标准指标:EER、准确率、精确率、召回率和F1分数进行评估时,我们提出的方法可以显著提高泛化性能和可解释的情感结果。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/df3a/9280549/770f8444fdcc/gr012.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/df3a/9280549/767fbec28405/gr001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/df3a/9280549/b0c544c14dff/gr002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/df3a/9280549/cb0d40ce7c34/gr003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/df3a/9280549/d9c62a40aa47/gr004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/df3a/9280549/55e0a92a70ca/gr005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/df3a/9280549/9cf31441ea6d/gr006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/df3a/9280549/03de055c06cd/gr007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/df3a/9280549/4920aba84213/gr008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/df3a/9280549/0b240cbf7351/gr009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/df3a/9280549/3f9ef28c46fe/gr010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/df3a/9280549/cf4ef2eac5f5/gr011.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/df3a/9280549/770f8444fdcc/gr012.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/df3a/9280549/767fbec28405/gr001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/df3a/9280549/b0c544c14dff/gr002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/df3a/9280549/cb0d40ce7c34/gr003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/df3a/9280549/d9c62a40aa47/gr004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/df3a/9280549/55e0a92a70ca/gr005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/df3a/9280549/9cf31441ea6d/gr006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/df3a/9280549/03de055c06cd/gr007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/df3a/9280549/4920aba84213/gr008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/df3a/9280549/0b240cbf7351/gr009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/df3a/9280549/3f9ef28c46fe/gr010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/df3a/9280549/cf4ef2eac5f5/gr011.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/df3a/9280549/770f8444fdcc/gr012.jpg

相似文献

1
Vector learning representation for generalized speech emotion recognition.用于广义语音情感识别的向量学习表示。
Heliyon. 2022 Mar 28;8(3):e09196. doi: 10.1016/j.heliyon.2022.e09196. eCollection 2022 Mar.
2
Utterance Level Feature Aggregation with Deep Metric Learning for Speech Emotion Recognition.基于深度度量学习的话语级特征聚合在语音情感识别中的研究
Sensors (Basel). 2021 Jun 20;21(12):4233. doi: 10.3390/s21124233.
3
Effect on speech emotion classification of a feature selection approach using a convolutional neural network.使用卷积神经网络的特征选择方法对语音情感分类的影响。
PeerJ Comput Sci. 2021 Nov 3;7:e766. doi: 10.7717/peerj-cs.766. eCollection 2021.
4
Impact of Feature Selection Algorithm on Speech Emotion Recognition Using Deep Convolutional Neural Network.基于深度卷积神经网络的特征选择算法对语音情感识别的影响。
Sensors (Basel). 2020 Oct 23;20(21):6008. doi: 10.3390/s20216008.
5
Fusing Visual Attention CNN and Bag of Visual Words for Cross-Corpus Speech Emotion Recognition.融合视觉注意 CNN 和视觉词袋用于跨语料库语音情感识别。
Sensors (Basel). 2020 Sep 28;20(19):5559. doi: 10.3390/s20195559.
6
IoT-Enabled WBAN and Machine Learning for Speech Emotion Recognition in Patients.物联网支持的 WBAN 和机器学习在患者语音情感识别中的应用。
Sensors (Basel). 2023 Mar 8;23(6):2948. doi: 10.3390/s23062948.
7
Deep-Net: A Lightweight CNN-Based Speech Emotion Recognition System Using Deep Frequency Features.深度网络:基于深度学习频率特征的轻量级 CNN 语音情感识别系统
Sensors (Basel). 2020 Sep 12;20(18):5212. doi: 10.3390/s20185212.
8
Two-Way Feature Extraction for Speech Emotion Recognition Using Deep Learning.基于深度学习的语音情感识别的双向特征提取。
Sensors (Basel). 2022 Mar 19;22(6):2378. doi: 10.3390/s22062378.
9
Speech emotion recognition based on improved masking EMD and convolutional recurrent neural network.基于改进的掩码经验模态分解和卷积递归神经网络的语音情感识别
Front Psychol. 2023 Jan 9;13:1075624. doi: 10.3389/fpsyg.2022.1075624. eCollection 2022.
10
Cross-corpus speech emotion recognition with transformers: Leveraging handcrafted features and data augmentation.基于 Transformer 的跨语料库语音情感识别:利用手工特征和数据增强。
Comput Biol Med. 2024 Sep;179:108841. doi: 10.1016/j.compbiomed.2024.108841. Epub 2024 Jul 12.

本文引用的文献

1
COVIDetectioNet: COVID-19 diagnosis system based on X-ray images using features selected from pre-learned deep features ensemble.COVIDetectioNet:基于X射线图像的COVID-19诊断系统,使用从预学习深度特征集成中选择的特征。
Appl Intell (Dordr). 2021;51(3):1213-1226. doi: 10.1007/s10489-020-01888-w. Epub 2020 Sep 18.
2
Impact of Feature Selection Algorithm on Speech Emotion Recognition Using Deep Convolutional Neural Network.基于深度卷积神经网络的特征选择算法对语音情感识别的影响。
Sensors (Basel). 2020 Oct 23;20(21):6008. doi: 10.3390/s20216008.
3
Deep Joint Spatiotemporal Network (DJSTN) for Efficient Facial Expression Recognition.
深度联合时空网络 (DJSTN) 用于高效的面部表情识别。
Sensors (Basel). 2020 Mar 30;20(7):1936. doi: 10.3390/s20071936.
4
The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS): A dynamic, multimodal set of facial and vocal expressions in North American English.瑞尔森情感语音和歌曲音频视频数据库(RAVDESS):一组具有北美英语特色的动态、多模态面部和声音表情数据集。
PLoS One. 2018 May 16;13(5):e0196391. doi: 10.1371/journal.pone.0196391. eCollection 2018.