Suppr超能文献

利用对抗性判别域泛化(ADDoG)改进跨语料库语音情感识别

Improving Cross-Corpus Speech Emotion Recognition with Adversarial Discriminative Domain Generalization (ADDoG).

作者信息

Gideon John, McInnis Melvin G, Provost Emily Mower

机构信息

University of Michigan, Ann Arbor, MI, USA.

出版信息

IEEE Trans Affect Comput. 2021 Oct-Dec;12(4):1055-1068. doi: 10.1109/taffc.2019.2916092. Epub 2019 May 14.

Abstract

Automatic speech emotion recognition provides computers with critical context to enable user understanding. While methods trained and tested within the same dataset have been shown successful, they often fail when applied to unseen datasets. To address this, recent work has focused on adversarial methods to find more generalized representations of emotional speech. However, many of these methods have issues converging, and only involve datasets collected in laboratory conditions. In this paper, we introduce Adversarial Discriminative Domain Generalization (ADDoG), which follows an easier to train "meet in the middle" approach. The model iteratively moves representations learned for each dataset closer to one another, improving cross-dataset generalization. We also introduce Multiclass ADDoG, or MADDoG, which is able to extend the proposed method to more than two datasets, simultaneously. Our results show consistent convergence for the introduced methods, with significantly improved results when not using labels from the target dataset. We also show how, in most cases, ADDoG and MADDoG can be used to improve upon baseline state-of-the-art methods when target dataset labels are added and in-the-wild data are considered. Even though our experiments focus on cross-corpus speech emotion, these methods could be used to remove unwanted factors of variation in other settings.

摘要

自动语音情感识别为计算机提供关键背景信息,以实现用户理解。虽然在同一数据集内训练和测试的方法已被证明是成功的,但应用于未见数据集时往往会失败。为了解决这个问题,最近的工作集中在对抗方法上,以找到情感语音更通用的表示。然而,这些方法中的许多都存在收敛问题,并且只涉及在实验室条件下收集的数据集。在本文中,我们介绍了对抗性判别域泛化(ADDoG),它采用了一种更易于训练的“中间相遇”方法。该模型迭代地使为每个数据集学习的表示彼此更接近,从而提高跨数据集泛化能力。我们还介绍了多类ADDoG,即MADDoG,它能够同时将所提出的方法扩展到两个以上的数据集。我们的结果表明,所介绍的方法具有一致的收敛性,在不使用目标数据集标签时结果有显著改善。我们还展示了在大多数情况下,当添加目标数据集标签并考虑自然数据时,ADDoG和MADDoG如何用于改进基线的最先进方法。尽管我们的实验集中在跨语料库语音情感上,但这些方法可用于消除其他环境中不需要的变化因素。

相似文献

9
Multi-Domain Adversarial Feature Generalization for Person Re-Identification.多领域对抗特征泛化的行人再识别
IEEE Trans Image Process. 2021;30:1596-1607. doi: 10.1109/TIP.2020.3046864. Epub 2021 Jan 11.

引用本文的文献

4
An Emotion-Driven Vocal Biomarker-Based PTSD Screening Tool.一种基于情感驱动的嗓音生物标志物的创伤后应激障碍筛查工具。
IEEE Open J Eng Med Biol. 2023 Jun 13;5:621-626. doi: 10.1109/OJEMB.2023.3284798. eCollection 2024.

本文引用的文献

1
ECOLOGICALLY VALID LONG-TERM MOOD MONITORING OF INDIVIDUALS WITH BIPOLAR DISORDER USING SPEECH.使用语音对双相情感障碍个体进行生态有效长期情绪监测。
Proc IEEE Int Conf Acoust Speech Signal Process. 2014 May;2014:4858-4862. doi: 10.1109/ICASSP.2014.6854525. Epub 2014 Jul 14.
3
Core affect and the psychological construction of emotion.核心情感与情绪的心理建构。
Psychol Rev. 2003 Jan;110(1):145-72. doi: 10.1037/0033-295x.110.1.145.

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验