Suppr超能文献

基于词袋表示、领域自适应和数据增强的跨语言语音情感识别。

Cross-Language Speech Emotion Recognition Using Bag-of-Word Representations, Domain Adaptation, and Data Augmentation.

机构信息

Institut National de la Recherche Scientifique, University of Quebec, Montréal, QC H3C 5J9, Canada.

出版信息

Sensors (Basel). 2022 Aug 26;22(17):6445. doi: 10.3390/s22176445.

Abstract

To date, several methods have been explored for the challenging task of cross-language speech emotion recognition, including the bag-of-words (BoW) methodology for feature processing, domain adaptation for feature distribution "normalization", and data augmentation to make machine learning algorithms more robust across testing conditions. Their combined use, however, has yet to be explored. In this paper, we aim to fill this gap and compare the benefits achieved by combining different domain adaptation strategies with the BoW method, as well as with data augmentation. Moreover, while domain adaptation strategies, such as the correlation alignment (CORAL) method, require knowledge of the test data language, we propose a variant that we term N-CORAL, in which test languages (in our case, Chinese) are mapped to a common distribution in an unsupervised manner. Experiments with German, French, and Hungarian language datasets were performed, and the proposed N-CORAL method, combined with BoW and data augmentation, was shown to achieve the best arousal and valence prediction accuracy, highlighting the usefulness of the proposed method for "in the wild" speech emotion recognition. In fact, N-CORAL combined with BoW was shown to provide robustness across languages, whereas data augmentation provided additional robustness against cross-corpus nuance factors.

摘要

迄今为止,已经探索了多种方法来解决跨语言语音情感识别这一具有挑战性的任务,包括用于特征处理的词袋 (BoW) 方法、用于特征分布“归一化”的领域自适应方法,以及用于使机器学习算法在测试条件下更稳健的数据增强方法。然而,它们的综合使用尚未得到探索。在本文中,我们旨在填补这一空白,并比较组合使用不同的领域自适应策略与 BoW 方法以及数据增强的优势。此外,虽然需要了解测试数据语言的领域自适应策略(如相关对齐 (CORAL) 方法),但我们提出了一种变体,我们称之为 N-CORAL,其中测试语言(在我们的案例中为中文)以无监督的方式映射到公共分布。我们对德语、法语和匈牙利语数据集进行了实验,结果表明,所提出的 N-CORAL 方法与 BoW 和数据增强相结合,可以实现最佳的唤醒度和效价预测精度,突出了所提出的方法在“真实世界”语音情感识别中的有用性。事实上,N-CORAL 与 BoW 相结合可以提供跨语言的稳健性,而数据增强则可以提供对跨语料库细微差别因素的额外稳健性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/01c8/9460701/3264e1d0b24b/sensors-22-06445-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验