Suppr超能文献

用于跨语料库语音情感识别的渐进式判别转移网络

Progressively Discriminative Transfer Network for Cross-Corpus Speech Emotion Recognition.

作者信息

Lu Cheng, Tang Chuangao, Zhang Jiacheng, Zong Yuan

机构信息

Key Laboratory of Child Development and Learning Science (Ministry of Education), Southeast University, Nanjing 210096, China.

School of Information Science and Engineering, Southeast University, Nanjing 210096, China.

出版信息

Entropy (Basel). 2022 Jul 29;24(8):1046. doi: 10.3390/e24081046.

Abstract

Cross-corpus speech emotion recognition (SER) is a challenging task, and its difficulty lies in the mismatch between the feature distributions of the training (source domain) and testing (target domain) data, leading to the performance degradation when the model deals with new domain data. Previous works explore utilizing domain adaptation (DA) to eliminate the domain shift between the source and target domains and have achieved the promising performance in SER. However, these methods mainly treat cross-corpus tasks simply as the DA problem, directly aligning the distributions across domains in a common feature space. In this case, excessively narrowing the domain distance will impair the emotion discrimination of speech features since it is difficult to maintain the completeness of the emotion space only by an emotion classifier. To overcome this issue, we propose a progressively discriminative transfer network (PDTN) for cross-corpus SER in this paper, which can enhance the emotion discrimination ability of speech features while eliminating the mismatch between the source and target corpora. In detail, we design two special losses in the feature layers of PDTN, i.e., emotion discriminant loss Ld and distribution alignment loss La. By incorporating prior knowledge of speech emotion into feature learning (i.e., high and low valence speech emotion features have their respective cluster centers), we integrate a valence-aware center loss Lv and an emotion-aware center loss Lc as the Ld to guarantee the discriminative learning of speech emotions except an emotion classifier. Furthermore, a multi-layer distribution alignment loss La is adopted to more precisely eliminate the discrepancy of feature distributions between the source and target domains. Finally, through the optimization of PDTN by combining three losses, i.e., cross-entropy loss Le, Ld, and La, we can gradually eliminate the domain mismatch between the source and target corpora while maintaining the emotion discrimination of speech features. Extensive experimental results of six cross-corpus tasks on three datasets, i.e., Emo-DB, eNTERFACE, and CASIA, reveal that our proposed PDTN outperforms the state-of-the-art methods.

摘要

跨语料库语音情感识别(SER)是一项具有挑战性的任务,其难点在于训练(源域)和测试(目标域)数据的特征分布不匹配,导致模型在处理新域数据时性能下降。先前的工作探索利用域自适应(DA)来消除源域和目标域之间的域偏移,并在SER中取得了有前景的性能。然而,这些方法主要将跨语料库任务简单地视为DA问题,直接在公共特征空间中对齐跨域分布。在这种情况下,过度缩小域距离会损害语音特征的情感辨别能力,因为仅靠情感分类器很难保持情感空间的完整性。为了克服这个问题,我们在本文中提出了一种用于跨语料库SER的渐进判别转移网络(PDTN),它可以在消除源语料库和目标语料库之间不匹配的同时增强语音特征的情感辨别能力。具体而言,我们在PDTN的特征层中设计了两个特殊损失,即情感判别损失Ld和分布对齐损失La。通过将语音情感的先验知识纳入特征学习(即高低效价语音情感特征有各自的聚类中心),我们整合了一个效价感知中心损失Lv和一个情感感知中心损失Lc作为Ld,以确保除情感分类器外语音情感的判别学习。此外,采用多层分布对齐损失La来更精确地消除源域和目标域之间特征分布的差异。最后,通过结合交叉熵损失Le、Ld和La对PDTN进行优化,我们可以在保持语音特征情感判别的同时逐步消除源语料库和目标语料库之间的域不匹配。在三个数据集即Emo-DB、eNTERFACE和CASIA上进行的六个跨语料库任务的广泛实验结果表明,我们提出的PDTN优于现有方法。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/102f/9407047/afa3a70bcc7e/entropy-24-01046-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验