• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于 FaceNet 框架的迁移学习的语音情感识别。

Speech emotion recognition based on transfer learning from the FaceNet framework.

机构信息

Northeast Normal University, Changchun, Jilin Province 130117, China.

College of Computing and Software Engineering, Kennesaw State University, Marietta, Georgia 30060, USA.

出版信息

J Acoust Soc Am. 2021 Feb;149(2):1338. doi: 10.1121/10.0003530.

DOI:10.1121/10.0003530
PMID:33639796
Abstract

Speech plays an important role in human-computer emotional interaction. FaceNet used in face recognition achieves great success due to its excellent feature extraction. In this study, we adopt the FaceNet model and improve it for speech emotion recognition. To apply this model for our work, speech signals are divided into segments at a given time interval, and the signal segments are transformed into a discrete waveform diagram and spectrogram. Subsequently, the waveform and spectrogram are separately fed into FaceNet for end-to-end training. Our empirical study shows that the pretraining is effective on the spectrogram for FaceNet. Hence, we pretrain the network on the CASIA dataset and then fine-tune it on the IEMOCAP dataset with waveforms. It will derive the maximum transfer learning knowledge from the CASIA dataset due to its high accuracy. This high accuracy may be due to its clean signals. Our preliminary experimental results show an accuracy of 68.96% and 90% on the emotion benchmark datasets IEMOCAP and CASIA, respectively. The cross-training is then conducted on the dataset, and comprehensive experiments are performed. Experimental results indicate that the proposed approach outperforms state-of-the-art methods on the IEMOCAP dataset among single modal approaches.

摘要

语音在人机情感交互中起着重要作用。由于其出色的特征提取能力,在人脸识别中使用的 FaceNet 取得了巨大的成功。在本研究中,我们采用了 FaceNet 模型并对其进行了改进,以用于语音情感识别。为了将该模型应用于我们的工作,我们将语音信号按照给定的时间间隔进行分段,然后将信号段转换为离散的波形图和频谱图。随后,将波形和频谱图分别输入到 FaceNet 中进行端到端训练。我们的实证研究表明,预训练在 FaceNet 的频谱图上是有效的。因此,我们在 CASIA 数据集上进行预训练,然后在 IEMOCAP 数据集上使用波形进行微调。由于其高精度,它将从 CASIA 数据集获得最大的迁移学习知识。这种高精度可能是由于其信号干净。我们的初步实验结果分别在 IEMOCAP 和 CASIA 情感基准数据集上达到了 68.96%和 90%的准确率。然后在数据集上进行交叉训练,并进行全面的实验。实验结果表明,在单模态方法中,该方法在 IEMOCAP 数据集上的表现优于最先进的方法。

相似文献

1
Speech emotion recognition based on transfer learning from the FaceNet framework.基于 FaceNet 框架的迁移学习的语音情感识别。
J Acoust Soc Am. 2021 Feb;149(2):1338. doi: 10.1121/10.0003530.
2
Impact of Feature Selection Algorithm on Speech Emotion Recognition Using Deep Convolutional Neural Network.基于深度卷积神经网络的特征选择算法对语音情感识别的影响。
Sensors (Basel). 2020 Oct 23;20(21):6008. doi: 10.3390/s20216008.
3
A comprehensive study on bilingual and multilingual speech emotion recognition using a two-pass classification scheme.使用双通分类方案进行双语和多语语音情感识别的综合研究。
PLoS One. 2019 Aug 15;14(8):e0220386. doi: 10.1371/journal.pone.0220386. eCollection 2019.
4
BAT: Block and token self-attention for speech emotion recognition.BAT:用于语音情感识别的块和令牌自注意力。
Neural Netw. 2022 Dec;156:67-80. doi: 10.1016/j.neunet.2022.09.022. Epub 2022 Sep 29.
5
Speech Emotion Recognition Using Convolution Neural Networks and Multi-Head Convolutional Transformer.基于卷积神经网络和多头卷积变换的语音情感识别。
Sensors (Basel). 2023 Jul 7;23(13):6212. doi: 10.3390/s23136212.
6
Utterance Level Feature Aggregation with Deep Metric Learning for Speech Emotion Recognition.基于深度度量学习的话语级特征聚合在语音情感识别中的研究
Sensors (Basel). 2021 Jun 20;21(12):4233. doi: 10.3390/s21124233.
7
Combining a parallel 2D CNN with a self-attention Dilated Residual Network for CTC-based discrete speech emotion recognition.基于 CTC 的离散语音情感识别中,将二维并行卷积神经网络与自注意力空洞残差网络相结合。
Neural Netw. 2021 Sep;141:52-60. doi: 10.1016/j.neunet.2021.03.013. Epub 2021 Mar 23.
8
Investigating the Use of Pretrained Convolutional Neural Network on Cross-Subject and Cross-Dataset EEG Emotion Recognition.研究基于预训练卷积神经网络的跨被试和跨数据集 EEG 情绪识别
Sensors (Basel). 2020 Apr 4;20(7):2034. doi: 10.3390/s20072034.
9
Deep-Net: A Lightweight CNN-Based Speech Emotion Recognition System Using Deep Frequency Features.深度网络:基于深度学习频率特征的轻量级 CNN 语音情感识别系统
Sensors (Basel). 2020 Sep 12;20(18):5212. doi: 10.3390/s20185212.
10
Two-Way Feature Extraction for Speech Emotion Recognition Using Deep Learning.基于深度学习的语音情感识别的双向特征提取。
Sensors (Basel). 2022 Mar 19;22(6):2378. doi: 10.3390/s22062378.

引用本文的文献

1
A multi-dilated convolution network for speech emotion recognition.一种用于语音情感识别的多扩张卷积网络。
Sci Rep. 2025 Mar 10;15(1):8254. doi: 10.1038/s41598-025-92640-2.
2
Design of Association Application System of Face Recognition and Test-Tube Barcode Based on CNN.基于卷积神经网络的人脸识别与试管条码关联应用系统设计。
Comput Math Methods Med. 2022 Aug 24;2022:1987857. doi: 10.1155/2022/1987857. eCollection 2022.
3
Design of Aging Smart Home Products Based on Radial Basis Function Speech Emotion Recognition.基于径向基函数语音情感识别的老年智能家居产品设计
Front Psychol. 2022 May 4;13:882709. doi: 10.3389/fpsyg.2022.882709. eCollection 2022.
4
Enterprise Strategic Management From the Perspective of Business Ecosystem Construction Based on Multimodal Emotion Recognition.基于多模态情感识别的商业生态系统构建视角下的企业战略管理
Front Psychol. 2022 Mar 3;13:857891. doi: 10.3389/fpsyg.2022.857891. eCollection 2022.
5
Emotional Speech Recognition Using Deep Neural Networks.使用深度神经网络进行情感语音识别。
Sensors (Basel). 2022 Feb 12;22(4):1414. doi: 10.3390/s22041414.