• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

融入相对难度和标注可靠性的语音情感识别。

Speech Emotion Recognition Incorporating Relative Difficulty and Labeling Reliability.

机构信息

School of Electrical Engineering and Computer Science, Gwangju Institute of Science and Technology, Buk-gu, Gwangju 61005, Republic of Korea.

出版信息

Sensors (Basel). 2024 Jun 25;24(13):4111. doi: 10.3390/s24134111.

DOI:10.3390/s24134111
PMID:39000889
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11244487/
Abstract

Emotions in speech are expressed in various ways, and the speech emotion recognition (SER) model may perform poorly on unseen corpora that contain different emotional factors from those expressed in training databases. To construct an SER model robust to unseen corpora, regularization approaches or metric losses have been studied. In this paper, we propose an SER method that incorporates relative difficulty and labeling reliability of each training sample. Inspired by the Proxy-Anchor loss, we propose a novel loss function which gives higher gradients to the samples for which the emotion labels are more difficult to estimate among those in the given minibatch. Since the annotators may label the emotion based on the emotional expression which resides in the conversational context or other modality but is not apparent in the given speech utterance, some of the emotional labels may not be reliable and these unreliable labels may affect the proposed loss function more severely. In this regard, we propose to apply label smoothing for the samples misclassified by a pre-trained SER model. Experimental results showed that the performance of the SER on unseen corpora was improved by adopting the proposed loss function with label smoothing on the misclassified data.

摘要

语音中的情感可以通过多种方式表达,而语音情感识别 (SER) 模型在包含与训练数据库中表达的情感因素不同的未见语料库上的性能可能会很差。为了构建对未见语料库具有鲁棒性的 SER 模型,已经研究了正则化方法或度量损失。在本文中,我们提出了一种 SER 方法,该方法结合了每个训练样本的相对难度和标记可靠性。受 Proxy-Anchor 损失的启发,我们提出了一种新的损失函数,该函数为给定小批量中那些情感标签更难估计的样本赋予更高的梯度。由于注释者可能会根据存在于会话上下文中或其他模态但在给定语音话语中不明显的情感表达来标记情感,因此一些情感标签可能不可靠,这些不可靠的标签可能会对所提出的损失函数产生更严重的影响。在这方面,我们建议对被预训练的 SER 模型错误分类的样本应用标签平滑。实验结果表明,通过采用带有标签平滑的所提出的损失函数,对未见语料库上的 SER 性能进行了改进。

相似文献

1
Speech Emotion Recognition Incorporating Relative Difficulty and Labeling Reliability.融入相对难度和标注可靠性的语音情感识别。
Sensors (Basel). 2024 Jun 25;24(13):4111. doi: 10.3390/s24134111.
2
Multi-Path and Group-Loss-Based Network for Speech Emotion Recognition in Multi-Domain Datasets.基于多路径和群组损失的网络在多领域数据集的语音情感识别。
Sensors (Basel). 2021 Feb 24;21(5):1579. doi: 10.3390/s21051579.
3
Adapting Multiple Distributions for Bridging Emotions from Different Speech Corpora.适配多种分布以弥合不同语音语料库中的情感差异。
Entropy (Basel). 2022 Sep 5;24(9):1250. doi: 10.3390/e24091250.
4
Speech Emotion Recognition Based on Selective Interpolation Synthetic Minority Over-Sampling Technique in Small Sample Environment.基于选择性插值合成少数过采样技术的小样本环境下的语音情感识别。
Sensors (Basel). 2020 Apr 17;20(8):2297. doi: 10.3390/s20082297.
5
Adaptive Data Boosting Technique for Robust Personalized Speech Emotion in Emotionally-Imbalanced Small-Sample Environments.自适应数据增强技术在情绪不平衡小样本环境中用于鲁棒个性化语音情感识别。
Sensors (Basel). 2018 Nov 2;18(11):3744. doi: 10.3390/s18113744.
6
Random Deep Belief Networks for Recognizing Emotions from Speech Signals.用于从语音信号中识别情绪的随机深度置信网络。
Comput Intell Neurosci. 2017;2017:1945630. doi: 10.1155/2017/1945630. Epub 2017 Mar 5.
7
Fusing Visual Attention CNN and Bag of Visual Words for Cross-Corpus Speech Emotion Recognition.融合视觉注意 CNN 和视觉词袋用于跨语料库语音情感识别。
Sensors (Basel). 2020 Sep 28;20(19):5559. doi: 10.3390/s20195559.
8
Progressive distribution adapted neural networks for cross-corpus speech emotion recognition.用于跨语料库语音情感识别的渐进分布自适应神经网络。
Front Neurorobot. 2022 Sep 15;16:987146. doi: 10.3389/fnbot.2022.987146. eCollection 2022.
9
The feature extraction based on texture image information for emotion sensing in speech.基于纹理图像信息的语音情感感知特征提取。
Sensors (Basel). 2014 Sep 9;14(9):16692-714. doi: 10.3390/s140916692.
10
Progressively Discriminative Transfer Network for Cross-Corpus Speech Emotion Recognition.用于跨语料库语音情感识别的渐进式判别转移网络
Entropy (Basel). 2022 Jul 29;24(8):1046. doi: 10.3390/e24081046.

本文引用的文献

1
A quasi-randomized feasibility pilot study of specific treatments to improve emotion recognition and mental-state reasoning impairments in schizophrenia.一项关于特定治疗方法以改善精神分裂症患者情绪识别和心理状态推理障碍的半随机可行性初步研究。
BMC Psychiatry. 2016 Oct 24;16(1):360. doi: 10.1186/s12888-016-1064-6.
2
CREMA-D: Crowd-sourced Emotional Multimodal Actors Dataset.CREMA-D:众包情感多模态演员数据集。
IEEE Trans Affect Comput. 2014 Oct-Dec;5(4):377-390. doi: 10.1109/TAFFC.2014.2336244.