Suppr超能文献

用于跨语料库语音情感识别的渐进分布自适应神经网络。

Progressive distribution adapted neural networks for cross-corpus speech emotion recognition.

作者信息

Zong Yuan, Lian Hailun, Zhang Jiacheng, Feng Ercui, Lu Cheng, Chang Hongli, Tang Chuangao

机构信息

Key Laboratory of Child Development and Learning Science of Ministry of Education, Southeast University, Nanjing, China.

School of Biological Science and Medical Engineering, Southeast University, Nanjing, China.

出版信息

Front Neurorobot. 2022 Sep 15;16:987146. doi: 10.3389/fnbot.2022.987146. eCollection 2022.

Abstract

In this paper, we investigate a challenging but interesting task in the research of speech emotion recognition (SER), i.e., cross-corpus SER. Unlike the conventional SER, the training (source) and testing (target) samples in cross-corpus SER come from different speech corpora, which results in a feature distribution mismatch between them. Hence, the performance of most existing SER methods may sharply decrease. To cope with this problem, we propose a simple yet effective deep transfer learning method called progressive distribution adapted neural networks (PDAN). PDAN employs convolutional neural networks (CNN) as the backbone and the speech spectrum as the inputs to achieve an end-to-end learning framework. More importantly, its basic idea for solving cross-corpus SER is very straightforward, i.e., enhancing the backbone's corpus invariant feature learning ability by incorporating a progressive distribution adapted regularization term into the original loss function to guide the network training. To evaluate the proposed PDAN, extensive cross-corpus SER experiments on speech emotion corpora including EmoDB, eNTERFACE, and CASIA are conducted. Experimental results showed that the proposed PDAN outperforms most well-performing deep and subspace transfer learning methods in dealing with the cross-corpus SER tasks.

摘要

在本文中,我们研究了语音情感识别(SER)研究中一项具有挑战性但有趣的任务,即跨语料库SER。与传统的SER不同,跨语料库SER中的训练(源)样本和测试(目标)样本来自不同的语音语料库,这导致它们之间的特征分布不匹配。因此,大多数现有SER方法的性能可能会急剧下降。为了解决这个问题,我们提出了一种简单而有效的深度迁移学习方法,称为渐进分布自适应神经网络(PDAN)。PDAN采用卷积神经网络(CNN)作为主干,语音频谱作为输入,以实现端到端的学习框架。更重要的是,其解决跨语料库SER的基本思想非常直接,即在原始损失函数中加入渐进分布自适应正则项,以增强主干的语料库不变特征学习能力,从而指导网络训练。为了评估所提出的PDAN,我们在包括EmoDB、eNTERFACE和CASIA在内的语音情感语料库上进行了广泛的跨语料库SER实验。实验结果表明,在处理跨语料库SER任务时,所提出的PDAN优于大多数性能良好的深度和子空间迁移学习方法。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验