融入相对难度和标注可靠性的语音情感识别。

Speech Emotion Recognition Incorporating Relative Difficulty and Labeling Reliability.

机构信息

School of Electrical Engineering and Computer Science, Gwangju Institute of Science and Technology, Buk-gu, Gwangju 61005, Republic of Korea.

出版信息

Sensors (Basel). 2024 Jun 25;24(13):4111. doi: 10.3390/s24134111.

DOI:10.3390/s24134111

PMID:39000889

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11244487/

Abstract

Emotions in speech are expressed in various ways, and the speech emotion recognition (SER) model may perform poorly on unseen corpora that contain different emotional factors from those expressed in training databases. To construct an SER model robust to unseen corpora, regularization approaches or metric losses have been studied. In this paper, we propose an SER method that incorporates relative difficulty and labeling reliability of each training sample. Inspired by the Proxy-Anchor loss, we propose a novel loss function which gives higher gradients to the samples for which the emotion labels are more difficult to estimate among those in the given minibatch. Since the annotators may label the emotion based on the emotional expression which resides in the conversational context or other modality but is not apparent in the given speech utterance, some of the emotional labels may not be reliable and these unreliable labels may affect the proposed loss function more severely. In this regard, we propose to apply label smoothing for the samples misclassified by a pre-trained SER model. Experimental results showed that the performance of the SER on unseen corpora was improved by adopting the proposed loss function with label smoothing on the misclassified data.

摘要

语音中的情感可以通过多种方式表达，而语音情感识别 (SER) 模型在包含与训练数据库中表达的情感因素不同的未见语料库上的性能可能会很差。为了构建对未见语料库具有鲁棒性的 SER 模型，已经研究了正则化方法或度量损失。在本文中，我们提出了一种 SER 方法，该方法结合了每个训练样本的相对难度和标记可靠性。受 Proxy-Anchor 损失的启发，我们提出了一种新的损失函数，该函数为给定小批量中那些情感标签更难估计的样本赋予更高的梯度。由于注释者可能会根据存在于会话上下文中或其他模态但在给定语音话语中不明显的情感表达来标记情感，因此一些情感标签可能不可靠，这些不可靠的标签可能会对所提出的损失函数产生更严重的影响。在这方面，我们建议对被预训练的 SER 模型错误分类的样本应用标签平滑。实验结果表明，通过采用带有标签平滑的所提出的损失函数，对未见语料库上的 SER 性能进行了改进。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

融入相对难度和标注可靠性的语音情感识别。

Speech Emotion Recognition Incorporating Relative Difficulty and Labeling Reliability.

机构信息

出版信息

相似文献

本文引用的文献

融入相对难度和标注可靠性的语音情感识别。

Speech Emotion Recognition Incorporating Relative Difficulty and Labeling Reliability.

机构信息

出版信息

相似文献

本文引用的文献