• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于多路径和群组损失的网络在多领域数据集的语音情感识别。

Multi-Path and Group-Loss-Based Network for Speech Emotion Recognition in Multi-Domain Datasets.

机构信息

Artificial Intelligence Research Laboratory, Electronics and Telecommunications Research Institute, Daejeon 34129, Korea.

出版信息

Sensors (Basel). 2021 Feb 24;21(5):1579. doi: 10.3390/s21051579.

DOI:10.3390/s21051579
PMID:33668254
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7956608/
Abstract

Speech emotion recognition (SER) is a natural method of recognizing individual emotions in everyday life. To distribute SER models to real-world applications, some key challenges must be overcome, such as the lack of datasets tagged with emotion labels and the weak generalization of the SER model for an unseen target domain. This study proposes a multi-path and group-loss-based network (MPGLN) for SER to support multi-domain adaptation. The proposed model includes a bidirectional long short-term memory-based temporal feature generator and a transferred feature extractor from the pre-trained VGG-like audio classification model (VGGish), and it learns simultaneously based on multiple losses according to the association of emotion labels in the discrete and dimensional models. For the evaluation of the MPGLN SER as applied to multi-cultural domain datasets, the Korean Emotional Speech Database (KESD), including KESDy18 and KESDy19, is constructed, and the English-speaking Interactive Emotional Dyadic Motion Capture database (IEMOCAP) is used. The evaluation of multi-domain adaptation and domain generalization showed 3.7% and 3.5% improvements, respectively, of the F1 score when comparing the performance of MPGLN SER with a baseline SER model that uses a temporal feature generator. We show that the MPGLN SER efficiently supports multi-domain adaptation and reinforces model generalization.

摘要

语音情感识别(SER)是一种识别日常生活中个体情感的自然方法。为了将 SER 模型应用于实际应用中,必须克服一些关键挑战,例如缺乏带有情感标签的数据集以及 SER 模型对未见目标域的弱泛化能力。本研究提出了一种基于多路径和组损失的网络(MPGLN)用于 SER,以支持多域自适应。所提出的模型包括基于双向长短期记忆的时间特征生成器和从预训练的 VGG 类似音频分类模型(VGGish)转移的特征提取器,它根据离散和维度模型中情感标签的关联,根据多个损失同时进行学习。为了评估 MPGLN SER 在多文化域数据集上的应用,构建了包括 KESDy18 和 KESDy19 的韩国情感语音数据库(KESD),并使用了英语交互情感双模态运动捕捉数据库(IEMOCAP)。多域自适应和域泛化的评估结果表明,与使用时间特征生成器的基线 SER 模型相比,MPGLN SER 的 F1 分数分别提高了 3.7%和 3.5%。我们表明,MPGLN SER 能够有效地支持多域自适应,并增强模型的泛化能力。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e8d0/7956608/02027e208083/sensors-21-01579-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e8d0/7956608/67fcb71f05cc/sensors-21-01579-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e8d0/7956608/a8a86265adf2/sensors-21-01579-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e8d0/7956608/197faeddd4e1/sensors-21-01579-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e8d0/7956608/5ce9c88aaf22/sensors-21-01579-g004a.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e8d0/7956608/4420f5abea08/sensors-21-01579-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e8d0/7956608/02027e208083/sensors-21-01579-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e8d0/7956608/67fcb71f05cc/sensors-21-01579-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e8d0/7956608/a8a86265adf2/sensors-21-01579-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e8d0/7956608/197faeddd4e1/sensors-21-01579-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e8d0/7956608/5ce9c88aaf22/sensors-21-01579-g004a.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e8d0/7956608/4420f5abea08/sensors-21-01579-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e8d0/7956608/02027e208083/sensors-21-01579-g006.jpg

相似文献

1
Multi-Path and Group-Loss-Based Network for Speech Emotion Recognition in Multi-Domain Datasets.基于多路径和群组损失的网络在多领域数据集的语音情感识别。
Sensors (Basel). 2021 Feb 24;21(5):1579. doi: 10.3390/s21051579.
2
Impact of Feature Selection Algorithm on Speech Emotion Recognition Using Deep Convolutional Neural Network.基于深度卷积神经网络的特征选择算法对语音情感识别的影响。
Sensors (Basel). 2020 Oct 23;20(21):6008. doi: 10.3390/s20216008.
3
Deep-Net: A Lightweight CNN-Based Speech Emotion Recognition System Using Deep Frequency Features.深度网络:基于深度学习频率特征的轻量级 CNN 语音情感识别系统
Sensors (Basel). 2020 Sep 12;20(18):5212. doi: 10.3390/s20185212.
4
Speech Emotion Recognition Incorporating Relative Difficulty and Labeling Reliability.融入相对难度和标注可靠性的语音情感识别。
Sensors (Basel). 2024 Jun 25;24(13):4111. doi: 10.3390/s24134111.
5
Combining a parallel 2D CNN with a self-attention Dilated Residual Network for CTC-based discrete speech emotion recognition.基于 CTC 的离散语音情感识别中,将二维并行卷积神经网络与自注意力空洞残差网络相结合。
Neural Netw. 2021 Sep;141:52-60. doi: 10.1016/j.neunet.2021.03.013. Epub 2021 Mar 23.
6
A CNN-Assisted Enhanced Audio Signal Processing for Speech Emotion Recognition.基于 CNN 的增强型音频信号处理在语音情感识别中的应用。
Sensors (Basel). 2019 Dec 28;20(1):183. doi: 10.3390/s20010183.
7
Classifier Subset Selection for the Stacked Generalization Method Applied to Emotion Recognition in Speech.应用于语音情感识别的堆叠泛化方法中的分类器子集选择
Sensors (Basel). 2015 Dec 25;16(1):21. doi: 10.3390/s16010021.
8
IoT-Enabled WBAN and Machine Learning for Speech Emotion Recognition in Patients.物联网支持的 WBAN 和机器学习在患者语音情感识别中的应用。
Sensors (Basel). 2023 Mar 8;23(6):2948. doi: 10.3390/s23062948.
9
Multi-resolution modulation-filtered cochleagram feature for LSTM-based dimensional emotion recognition from speech.基于 LSTM 的语音维度情感识别的多分辨率调制滤波耳蜗图特征。
Neural Netw. 2021 Aug;140:261-273. doi: 10.1016/j.neunet.2021.03.027. Epub 2021 Mar 25.
10
A New Network Structure for Speech Emotion Recognition Research.用于语音情感识别研究的新型网络结构。
Sensors (Basel). 2024 Feb 22;24(5):1429. doi: 10.3390/s24051429.

引用本文的文献

1
Special Issue "Emotion Intelligence Based on Smart Sensing".特刊征稿:基于智能传感的情绪智力
Sensors (Basel). 2023 Jan 18;23(3):1098. doi: 10.3390/s23031098.
2
Accelerating On-Device Learning with Layer-Wise Processor Selection Method on Unified Memory.使用统一内存上的逐层处理器选择方法加速设备端学习
Sensors (Basel). 2021 Mar 29;21(7):2364. doi: 10.3390/s21072364.

本文引用的文献

1
Deep-Net: A Lightweight CNN-Based Speech Emotion Recognition System Using Deep Frequency Features.深度网络:基于深度学习频率特征的轻量级 CNN 语音情感识别系统
Sensors (Basel). 2020 Sep 12;20(18):5212. doi: 10.3390/s20185212.
2
Emotion Recognition in Immersive Virtual Reality: From Statistics to Affective Computing.沉浸式虚拟现实中的情感识别:从统计学到情感计算。
Sensors (Basel). 2020 Sep 10;20(18):5163. doi: 10.3390/s20185163.
3
The uulmMAC Database-A Multimodal Affective Corpus for Affective Computing in Human-Computer Interaction.
uulmMAC 数据库——用于人机交互中情感计算的多模态情感语料库。
Sensors (Basel). 2020 Apr 17;20(8):2308. doi: 10.3390/s20082308.
4
Speech Emotion Recognition with Heterogeneous Feature Unification of Deep Neural Network.基于深度神经网络异构特征统一的语音情感识别
Sensors (Basel). 2019 Jun 18;19(12):2730. doi: 10.3390/s19122730.
5
Adaptive Data Boosting Technique for Robust Personalized Speech Emotion in Emotionally-Imbalanced Small-Sample Environments.自适应数据增强技术在情绪不平衡小样本环境中用于鲁棒个性化语音情感识别。
Sensors (Basel). 2018 Nov 2;18(11):3744. doi: 10.3390/s18113744.
6
The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS): A dynamic, multimodal set of facial and vocal expressions in North American English.瑞尔森情感语音和歌曲音频视频数据库(RAVDESS):一组具有北美英语特色的动态、多模态面部和声音表情数据集。
PLoS One. 2018 May 16;13(5):e0196391. doi: 10.1371/journal.pone.0196391. eCollection 2018.