用于跨语料库语音情感识别的渐进分布自适应神经网络。

Progressive distribution adapted neural networks for cross-corpus speech emotion recognition.

作者信息

Zong Yuan, Lian Hailun, Zhang Jiacheng, Feng Ercui, Lu Cheng, Chang Hongli, Tang Chuangao

机构信息

Key Laboratory of Child Development and Learning Science of Ministry of Education, Southeast University, Nanjing, China.

School of Biological Science and Medical Engineering, Southeast University, Nanjing, China.

出版信息

Front Neurorobot. 2022 Sep 15;16:987146. doi: 10.3389/fnbot.2022.987146. eCollection 2022.

DOI:10.3389/fnbot.2022.987146

PMID:36187564

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9520908/

Abstract

In this paper, we investigate a challenging but interesting task in the research of speech emotion recognition (SER), i.e., cross-corpus SER. Unlike the conventional SER, the training (source) and testing (target) samples in cross-corpus SER come from different speech corpora, which results in a feature distribution mismatch between them. Hence, the performance of most existing SER methods may sharply decrease. To cope with this problem, we propose a simple yet effective deep transfer learning method called progressive distribution adapted neural networks (PDAN). PDAN employs convolutional neural networks (CNN) as the backbone and the speech spectrum as the inputs to achieve an end-to-end learning framework. More importantly, its basic idea for solving cross-corpus SER is very straightforward, i.e., enhancing the backbone's corpus invariant feature learning ability by incorporating a progressive distribution adapted regularization term into the original loss function to guide the network training. To evaluate the proposed PDAN, extensive cross-corpus SER experiments on speech emotion corpora including EmoDB, eNTERFACE, and CASIA are conducted. Experimental results showed that the proposed PDAN outperforms most well-performing deep and subspace transfer learning methods in dealing with the cross-corpus SER tasks.

摘要

在本文中，我们研究了语音情感识别（SER）研究中一项具有挑战性但有趣的任务，即跨语料库SER。与传统的SER不同，跨语料库SER中的训练（源）样本和测试（目标）样本来自不同的语音语料库，这导致它们之间的特征分布不匹配。因此，大多数现有SER方法的性能可能会急剧下降。为了解决这个问题，我们提出了一种简单而有效的深度迁移学习方法，称为渐进分布自适应神经网络（PDAN）。PDAN采用卷积神经网络（CNN）作为主干，语音频谱作为输入，以实现端到端的学习框架。更重要的是，其解决跨语料库SER的基本思想非常直接，即在原始损失函数中加入渐进分布自适应正则项，以增强主干的语料库不变特征学习能力，从而指导网络训练。为了评估所提出的PDAN，我们在包括EmoDB、eNTERFACE和CASIA在内的语音情感语料库上进行了广泛的跨语料库SER实验。实验结果表明，在处理跨语料库SER任务时，所提出的PDAN优于大多数性能良好的深度和子空间迁移学习方法。

相似文献

Progressive distribution adapted neural networks for cross-corpus speech emotion recognition.

Front Neurorobot. 2022 Sep 15;16:987146. doi: 10.3389/fnbot.2022.987146. eCollection 2022.

Adapting Multiple Distributions for Bridging Emotions from Different Speech Corpora.

Entropy (Basel). 2022 Sep 5;24(9):1250. doi: 10.3390/e24091250.

Progressively Discriminative Transfer Network for Cross-Corpus Speech Emotion Recognition.

Entropy (Basel). 2022 Jul 29;24(8):1046. doi: 10.3390/e24081046.

Cross-Corpus Speech Emotion Recognition Based on Transfer Learning and Multi-Loss Dynamic Adjustment.

Comput Intell Neurosci. 2022 Sep 20;2022:5019384. doi: 10.1155/2022/5019384. eCollection 2022.

Fusing Visual Attention CNN and Bag of Visual Words for Cross-Corpus Speech Emotion Recognition.

Sensors (Basel). 2020 Sep 28;20(19):5559. doi: 10.3390/s20195559.

Deep Cross-Corpus Speech Emotion Recognition: Recent Advances and Perspectives.

Front Neurorobot. 2021 Nov 29;15:784514. doi: 10.3389/fnbot.2021.784514. eCollection 2021.

Cross-Corpus Speech Emotion Recognition Based on Multi-Task Learning and Subdomain Adaptation.

Entropy (Basel). 2023 Jan 7;25(1):124. doi: 10.3390/e25010124.

Deep-Net: A Lightweight CNN-Based Speech Emotion Recognition System Using Deep Frequency Features.

Sensors (Basel). 2020 Sep 12;20(18):5212. doi: 10.3390/s20185212.

Combining a parallel 2D CNN with a self-attention Dilated Residual Network for CTC-based discrete speech emotion recognition.

Neural Netw. 2021 Sep;141:52-60. doi: 10.1016/j.neunet.2021.03.013. Epub 2021 Mar 23.

Emotion recognition for human-computer interaction using high-level descriptors.

Sci Rep. 2024 May 27;14(1):12122. doi: 10.1038/s41598-024-59294-y.

本文引用的文献

Improving Cross-Corpus Speech Emotion Recognition with Adversarial Discriminative Domain Generalization (ADDoG).

IEEE Trans Affect Comput. 2021 Oct-Dec;12(4):1055-1068. doi: 10.1109/taffc.2019.2916092. Epub 2019 May 14.

Deep Subdomain Adaptation Network for Image Classification.

IEEE Trans Neural Netw Learn Syst. 2021 Apr;32(4):1713-1722. doi: 10.1109/TNNLS.2020.2988928. Epub 2021 Apr 2.

Domain adaptation via transfer component analysis.

IEEE Trans Neural Netw. 2011 Feb;22(2):199-210. doi: 10.1109/TNN.2010.2091281. Epub 2010 Nov 18.

Integrating structured biological data by Kernel Maximum Mean Discrepancy.

Bioinformatics. 2006 Jul 15;22(14):e49-57. doi: 10.1093/bioinformatics/btl242.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

用于跨语料库语音情感识别的渐进分布自适应神经网络。

Progressive distribution adapted neural networks for cross-corpus speech emotion recognition.

作者信息

机构信息

出版信息

相似文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

本文引用的文献