• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

用于跨语料库语音情感识别的渐进式判别转移网络

Progressively Discriminative Transfer Network for Cross-Corpus Speech Emotion Recognition.

作者信息

Lu Cheng, Tang Chuangao, Zhang Jiacheng, Zong Yuan

机构信息

Key Laboratory of Child Development and Learning Science (Ministry of Education), Southeast University, Nanjing 210096, China.

School of Information Science and Engineering, Southeast University, Nanjing 210096, China.

出版信息

Entropy (Basel). 2022 Jul 29;24(8):1046. doi: 10.3390/e24081046.

DOI:10.3390/e24081046
PMID:36010710
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9407047/
Abstract

Cross-corpus speech emotion recognition (SER) is a challenging task, and its difficulty lies in the mismatch between the feature distributions of the training (source domain) and testing (target domain) data, leading to the performance degradation when the model deals with new domain data. Previous works explore utilizing domain adaptation (DA) to eliminate the domain shift between the source and target domains and have achieved the promising performance in SER. However, these methods mainly treat cross-corpus tasks simply as the DA problem, directly aligning the distributions across domains in a common feature space. In this case, excessively narrowing the domain distance will impair the emotion discrimination of speech features since it is difficult to maintain the completeness of the emotion space only by an emotion classifier. To overcome this issue, we propose a progressively discriminative transfer network (PDTN) for cross-corpus SER in this paper, which can enhance the emotion discrimination ability of speech features while eliminating the mismatch between the source and target corpora. In detail, we design two special losses in the feature layers of PDTN, i.e., emotion discriminant loss Ld and distribution alignment loss La. By incorporating prior knowledge of speech emotion into feature learning (i.e., high and low valence speech emotion features have their respective cluster centers), we integrate a valence-aware center loss Lv and an emotion-aware center loss Lc as the Ld to guarantee the discriminative learning of speech emotions except an emotion classifier. Furthermore, a multi-layer distribution alignment loss La is adopted to more precisely eliminate the discrepancy of feature distributions between the source and target domains. Finally, through the optimization of PDTN by combining three losses, i.e., cross-entropy loss Le, Ld, and La, we can gradually eliminate the domain mismatch between the source and target corpora while maintaining the emotion discrimination of speech features. Extensive experimental results of six cross-corpus tasks on three datasets, i.e., Emo-DB, eNTERFACE, and CASIA, reveal that our proposed PDTN outperforms the state-of-the-art methods.

摘要

跨语料库语音情感识别(SER)是一项具有挑战性的任务,其难点在于训练(源域)和测试(目标域)数据的特征分布不匹配,导致模型在处理新域数据时性能下降。先前的工作探索利用域自适应(DA)来消除源域和目标域之间的域偏移,并在SER中取得了有前景的性能。然而,这些方法主要将跨语料库任务简单地视为DA问题,直接在公共特征空间中对齐跨域分布。在这种情况下,过度缩小域距离会损害语音特征的情感辨别能力,因为仅靠情感分类器很难保持情感空间的完整性。为了克服这个问题,我们在本文中提出了一种用于跨语料库SER的渐进判别转移网络(PDTN),它可以在消除源语料库和目标语料库之间不匹配的同时增强语音特征的情感辨别能力。具体而言,我们在PDTN的特征层中设计了两个特殊损失,即情感判别损失Ld和分布对齐损失La。通过将语音情感的先验知识纳入特征学习(即高低效价语音情感特征有各自的聚类中心),我们整合了一个效价感知中心损失Lv和一个情感感知中心损失Lc作为Ld,以确保除情感分类器外语音情感的判别学习。此外,采用多层分布对齐损失La来更精确地消除源域和目标域之间特征分布的差异。最后,通过结合交叉熵损失Le、Ld和La对PDTN进行优化,我们可以在保持语音特征情感判别的同时逐步消除源语料库和目标语料库之间的域不匹配。在三个数据集即Emo-DB、eNTERFACE和CASIA上进行的六个跨语料库任务的广泛实验结果表明,我们提出的PDTN优于现有方法。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/102f/9407047/71f2f8b15bb6/entropy-24-01046-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/102f/9407047/afa3a70bcc7e/entropy-24-01046-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/102f/9407047/1a90fca11f72/entropy-24-01046-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/102f/9407047/71f2f8b15bb6/entropy-24-01046-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/102f/9407047/afa3a70bcc7e/entropy-24-01046-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/102f/9407047/1a90fca11f72/entropy-24-01046-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/102f/9407047/71f2f8b15bb6/entropy-24-01046-g003.jpg

相似文献

1
Progressively Discriminative Transfer Network for Cross-Corpus Speech Emotion Recognition.用于跨语料库语音情感识别的渐进式判别转移网络
Entropy (Basel). 2022 Jul 29;24(8):1046. doi: 10.3390/e24081046.
2
Adapting Multiple Distributions for Bridging Emotions from Different Speech Corpora.适配多种分布以弥合不同语音语料库中的情感差异。
Entropy (Basel). 2022 Sep 5;24(9):1250. doi: 10.3390/e24091250.
3
Progressive distribution adapted neural networks for cross-corpus speech emotion recognition.用于跨语料库语音情感识别的渐进分布自适应神经网络。
Front Neurorobot. 2022 Sep 15;16:987146. doi: 10.3389/fnbot.2022.987146. eCollection 2022.
4
Cross-Corpus Speech Emotion Recognition Based on Transfer Learning and Multi-Loss Dynamic Adjustment.基于迁移学习和多损失动态调整的跨语料库语音情感识别。
Comput Intell Neurosci. 2022 Sep 20;2022:5019384. doi: 10.1155/2022/5019384. eCollection 2022.
5
Cross-Corpus Speech Emotion Recognition Based on Multi-Task Learning and Subdomain Adaptation.基于多任务学习和子域自适应的跨语料库语音情感识别
Entropy (Basel). 2023 Jan 7;25(1):124. doi: 10.3390/e25010124.
6
Cross-corpus speech emotion recognition with transformers: Leveraging handcrafted features and data augmentation.基于 Transformer 的跨语料库语音情感识别:利用手工特征和数据增强。
Comput Biol Med. 2024 Sep;179:108841. doi: 10.1016/j.compbiomed.2024.108841. Epub 2024 Jul 12.
7
Fusing Visual Attention CNN and Bag of Visual Words for Cross-Corpus Speech Emotion Recognition.融合视觉注意 CNN 和视觉词袋用于跨语料库语音情感识别。
Sensors (Basel). 2020 Sep 28;20(19):5559. doi: 10.3390/s20195559.
8
Multi-Path and Group-Loss-Based Network for Speech Emotion Recognition in Multi-Domain Datasets.基于多路径和群组损失的网络在多领域数据集的语音情感识别。
Sensors (Basel). 2021 Feb 24;21(5):1579. doi: 10.3390/s21051579.
9
Cross-Language Speech Emotion Recognition Using Bag-of-Word Representations, Domain Adaptation, and Data Augmentation.基于词袋表示、领域自适应和数据增强的跨语言语音情感识别。
Sensors (Basel). 2022 Aug 26;22(17):6445. doi: 10.3390/s22176445.
10
An adversarial discriminative temporal convolutional network for EEG-based cross-domain emotion recognition.一种用于基于脑电图的跨域情感识别的对抗性判别式时间卷积网络。
Comput Biol Med. 2022 Feb;141:105048. doi: 10.1016/j.compbiomed.2021.105048. Epub 2021 Nov 22.

引用本文的文献

1
A Comprehensive Review of Multimodal Emotion Recognition: Techniques, Challenges, and Future Directions.多模态情感识别综述:技术、挑战与未来方向
Biomimetics (Basel). 2025 Jun 27;10(7):418. doi: 10.3390/biomimetics10070418.
2
A Survey of Deep Learning-Based Multimodal Emotion Recognition: Speech, Text, and Face.基于深度学习的多模态情感识别综述:语音、文本和面部
Entropy (Basel). 2023 Oct 12;25(10):1440. doi: 10.3390/e25101440.
3
Audio Augmentation for Non-Native Children's Speech Recognition through Discriminative Learning.

本文引用的文献

1
Improving Cross-Corpus Speech Emotion Recognition with Adversarial Discriminative Domain Generalization (ADDoG).利用对抗性判别域泛化(ADDoG)改进跨语料库语音情感识别
IEEE Trans Affect Comput. 2021 Oct-Dec;12(4):1055-1068. doi: 10.1109/taffc.2019.2916092. Epub 2019 May 14.
2
Deep Subdomain Adaptation Network for Image Classification.用于图像分类的深度子域适应网络
IEEE Trans Neural Netw Learn Syst. 2021 Apr;32(4):1713-1722. doi: 10.1109/TNNLS.2020.2988928. Epub 2021 Apr 2.
通过判别式学习实现非母语儿童语音识别的音频增强
Entropy (Basel). 2022 Oct 19;24(10):1490. doi: 10.3390/e24101490.