用于跨语料库语音情感识别的渐进式判别转移网络

Lu Cheng, Tang Chuangao, Zhang Jiacheng, Zong Yuan

Key Laboratory of Child Development and Learning Science (Ministry of Education), Southeast University, Nanjing 210096, China.

School of Information Science and Engineering, Southeast University, Nanjing 210096, China.

Entropy (Basel). 2022 Jul 29;24(8):1046. doi: 10.3390/e24081046.

Cross-corpus speech emotion recognition (SER) is a challenging task, and its difficulty lies in the mismatch between the feature distributions of the training (source domain) and testing (target domain) data, leading to the performance degradation when the model deals with new domain data. Previous works explore utilizing domain adaptation (DA) to eliminate the domain shift between the source and target domains and have achieved the promising performance in SER. However, these methods mainly treat cross-corpus tasks simply as the DA problem, directly aligning the distributions across domains in a common feature space. In this case, excessively narrowing the domain distance will impair the emotion discrimination of speech features since it is difficult to maintain the completeness of the emotion space only by an emotion classifier. To overcome this issue, we propose a progressively discriminative transfer network (PDTN) for cross-corpus SER in this paper, which can enhance the emotion discrimination ability of speech features while eliminating the mismatch between the source and target corpora. In detail, we design two special losses in the feature layers of PDTN, i.e., emotion discriminant loss Ld and distribution alignment loss La. By incorporating prior knowledge of speech emotion into feature learning (i.e., high and low valence speech emotion features have their respective cluster centers), we integrate a valence-aware center loss Lv and an emotion-aware center loss Lc as the Ld to guarantee the discriminative learning of speech emotions except an emotion classifier. Furthermore, a multi-layer distribution alignment loss La is adopted to more precisely eliminate the discrepancy of feature distributions between the source and target domains. Finally, through the optimization of PDTN by combining three losses, i.e., cross-entropy loss Le, Ld, and La, we can gradually eliminate the domain mismatch between the source and target corpora while maintaining the emotion discrimination of speech features. Extensive experimental results of six cross-corpus tasks on three datasets, i.e., Emo-DB, eNTERFACE, and CASIA, reveal that our proposed PDTN outperforms the state-of-the-art methods.

跨语料库语音情感识别（SER）是一项具有挑战性的任务，其难点在于训练（源域）和测试（目标域）数据的特征分布不匹配，导致模型在处理新域数据时性能下降。先前的工作探索利用域自适应（DA）来消除源域和目标域之间的域偏移，并在SER中取得了有前景的性能。然而，这些方法主要将跨语料库任务简单地视为DA问题，直接在公共特征空间中对齐跨域分布。在这种情况下，过度缩小域距离会损害语音特征的情感辨别能力，因为仅靠情感分类器很难保持情感空间的完整性。为了克服这个问题，我们在本文中提出了一种用于跨语料库SER的渐进判别转移网络（PDTN），它可以在消除源语料库和目标语料库之间不匹配的同时增强语音特征的情感辨别能力。具体而言，我们在PDTN的特征层中设计了两个特殊损失，即情感判别损失Ld和分布对齐损失La。通过将语音情感的先验知识纳入特征学习（即高低效价语音情感特征有各自的聚类中心），我们整合了一个效价感知中心损失Lv和一个情感感知中心损失Lc作为Ld，以确保除情感分类器外语音情感的判别学习。此外，采用多层分布对齐损失La来更精确地消除源域和目标域之间特征分布的差异。最后，通过结合交叉熵损失Le、Ld和La对PDTN进行优化，我们可以在保持语音特征情感判别的同时逐步消除源语料库和目标语料库之间的域不匹配。在三个数据集即Emo-DB、eNTERFACE和CASIA上进行的六个跨语料库任务的广泛实验结果表明，我们提出的PDTN优于现有方法。

相似文献

Progressively Discriminative Transfer Network for Cross-Corpus Speech Emotion Recognition.

Entropy (Basel). 2022 Jul 29;24(8):1046. doi: 10.3390/e24081046.

Adapting Multiple Distributions for Bridging Emotions from Different Speech Corpora.

Entropy (Basel). 2022 Sep 5;24(9):1250. doi: 10.3390/e24091250.

Progressive distribution adapted neural networks for cross-corpus speech emotion recognition.

Front Neurorobot. 2022 Sep 15;16:987146. doi: 10.3389/fnbot.2022.987146. eCollection 2022.

Cross-Corpus Speech Emotion Recognition Based on Transfer Learning and Multi-Loss Dynamic Adjustment.

Comput Intell Neurosci. 2022 Sep 20;2022:5019384. doi: 10.1155/2022/5019384. eCollection 2022.

Cross-Corpus Speech Emotion Recognition Based on Multi-Task Learning and Subdomain Adaptation.

Entropy (Basel). 2023 Jan 7;25(1):124. doi: 10.3390/e25010124.

Cross-corpus speech emotion recognition with transformers: Leveraging handcrafted features and data augmentation.

Comput Biol Med. 2024 Sep;179:108841. doi: 10.1016/j.compbiomed.2024.108841. Epub 2024 Jul 12.

Fusing Visual Attention CNN and Bag of Visual Words for Cross-Corpus Speech Emotion Recognition.

Sensors (Basel). 2020 Sep 28;20(19):5559. doi: 10.3390/s20195559.

Multi-Path and Group-Loss-Based Network for Speech Emotion Recognition in Multi-Domain Datasets.

Sensors (Basel). 2021 Feb 24;21(5):1579. doi: 10.3390/s21051579.

Cross-Language Speech Emotion Recognition Using Bag-of-Word Representations, Domain Adaptation, and Data Augmentation.

Sensors (Basel). 2022 Aug 26;22(17):6445. doi: 10.3390/s22176445.

An adversarial discriminative temporal convolutional network for EEG-based cross-domain emotion recognition.

Comput Biol Med. 2022 Feb;141:105048. doi: 10.1016/j.compbiomed.2021.105048. Epub 2021 Nov 22.

引用本文的文献

A Comprehensive Review of Multimodal Emotion Recognition: Techniques, Challenges, and Future Directions.

Biomimetics (Basel). 2025 Jun 27;10(7):418. doi: 10.3390/biomimetics10070418.

A Survey of Deep Learning-Based Multimodal Emotion Recognition: Speech, Text, and Face.

Entropy (Basel). 2023 Oct 12;25(10):1440. doi: 10.3390/e25101440.

Audio Augmentation for Non-Native Children's Speech Recognition through Discriminative Learning.

Entropy (Basel). 2022 Oct 19;24(10):1490. doi: 10.3390/e24101490.

本文引用的文献

Improving Cross-Corpus Speech Emotion Recognition with Adversarial Discriminative Domain Generalization (ADDoG).

IEEE Trans Affect Comput. 2021 Oct-Dec;12(4):1055-1068. doi: 10.1109/taffc.2019.2916092. Epub 2019 May 14.

Deep Subdomain Adaptation Network for Image Classification.

IEEE Trans Neural Netw Learn Syst. 2021 Apr;32(4):1713-1722. doi: 10.1109/TNNLS.2020.2988928. Epub 2021 Apr 2.

Suppr 超能文献

核心技术专利：CN118964589B侵权必究

相似文献

Progressively Discriminative Transfer Network for Cross-Corpus Speech Emotion Recognition.

Entropy (Basel). 2022 Jul 29;24(8):1046. doi: 10.3390/e24081046.

Adapting Multiple Distributions for Bridging Emotions from Different Speech Corpora.

Entropy (Basel). 2022 Sep 5;24(9):1250. doi: 10.3390/e24091250.

Progressive distribution adapted neural networks for cross-corpus speech emotion recognition.

Front Neurorobot. 2022 Sep 15;16:987146. doi: 10.3389/fnbot.2022.987146. eCollection 2022.

Cross-Corpus Speech Emotion Recognition Based on Transfer Learning and Multi-Loss Dynamic Adjustment.

Comput Intell Neurosci. 2022 Sep 20;2022:5019384. doi: 10.1155/2022/5019384. eCollection 2022.

Cross-Corpus Speech Emotion Recognition Based on Multi-Task Learning and Subdomain Adaptation.

Entropy (Basel). 2023 Jan 7;25(1):124. doi: 10.3390/e25010124.

Cross-corpus speech emotion recognition with transformers: Leveraging handcrafted features and data augmentation.

Comput Biol Med. 2024 Sep;179:108841. doi: 10.1016/j.compbiomed.2024.108841. Epub 2024 Jul 12.

Fusing Visual Attention CNN and Bag of Visual Words for Cross-Corpus Speech Emotion Recognition.

Sensors (Basel). 2020 Sep 28;20(19):5559. doi: 10.3390/s20195559.

Multi-Path and Group-Loss-Based Network for Speech Emotion Recognition in Multi-Domain Datasets.

Sensors (Basel). 2021 Feb 24;21(5):1579. doi: 10.3390/s21051579.

Cross-Language Speech Emotion Recognition Using Bag-of-Word Representations, Domain Adaptation, and Data Augmentation.

Sensors (Basel). 2022 Aug 26;22(17):6445. doi: 10.3390/s22176445.

An adversarial discriminative temporal convolutional network for EEG-based cross-domain emotion recognition.

Comput Biol Med. 2022 Feb;141:105048. doi: 10.1016/j.compbiomed.2021.105048. Epub 2021 Nov 22.

引用本文的文献

A Comprehensive Review of Multimodal Emotion Recognition: Techniques, Challenges, and Future Directions.

Biomimetics (Basel). 2025 Jun 27;10(7):418. doi: 10.3390/biomimetics10070418.

A Survey of Deep Learning-Based Multimodal Emotion Recognition: Speech, Text, and Face.

Entropy (Basel). 2023 Oct 12;25(10):1440. doi: 10.3390/e25101440.

Audio Augmentation for Non-Native Children's Speech Recognition through Discriminative Learning.

Entropy (Basel). 2022 Oct 19;24(10):1490. doi: 10.3390/e24101490.

本文引用的文献

Improving Cross-Corpus Speech Emotion Recognition with Adversarial Discriminative Domain Generalization (ADDoG).

IEEE Trans Affect Comput. 2021 Oct-Dec;12(4):1055-1068. doi: 10.1109/taffc.2019.2916092. Epub 2019 May 14.

Deep Subdomain Adaptation Network for Image Classification.

IEEE Trans Neural Netw Learn Syst. 2021 Apr;32(4):1713-1722. doi: 10.1109/TNNLS.2020.2988928. Epub 2021 Apr 2.

Progressively Discriminative Transfer Network for Cross-Corpus Speech Emotion Recognition.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献