利用对抗性判别域泛化（ADDoG）改进跨语料库语音情感识别

Improving Cross-Corpus Speech Emotion Recognition with Adversarial Discriminative Domain Generalization (ADDoG).

作者信息

Gideon John, McInnis Melvin G, Provost Emily Mower

机构信息

University of Michigan, Ann Arbor, MI, USA.

出版信息

IEEE Trans Affect Comput. 2021 Oct-Dec;12(4):1055-1068. doi: 10.1109/taffc.2019.2916092. Epub 2019 May 14.

DOI:10.1109/taffc.2019.2916092

PMID:35695825

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9173710/

Abstract

Automatic speech emotion recognition provides computers with critical context to enable user understanding. While methods trained and tested within the same dataset have been shown successful, they often fail when applied to unseen datasets. To address this, recent work has focused on adversarial methods to find more generalized representations of emotional speech. However, many of these methods have issues converging, and only involve datasets collected in laboratory conditions. In this paper, we introduce Adversarial Discriminative Domain Generalization (ADDoG), which follows an easier to train "meet in the middle" approach. The model iteratively moves representations learned for each dataset closer to one another, improving cross-dataset generalization. We also introduce Multiclass ADDoG, or MADDoG, which is able to extend the proposed method to more than two datasets, simultaneously. Our results show consistent convergence for the introduced methods, with significantly improved results when not using labels from the target dataset. We also show how, in most cases, ADDoG and MADDoG can be used to improve upon baseline state-of-the-art methods when target dataset labels are added and in-the-wild data are considered. Even though our experiments focus on cross-corpus speech emotion, these methods could be used to remove unwanted factors of variation in other settings.

摘要

自动语音情感识别为计算机提供关键背景信息，以实现用户理解。虽然在同一数据集内训练和测试的方法已被证明是成功的，但应用于未见数据集时往往会失败。为了解决这个问题，最近的工作集中在对抗方法上，以找到情感语音更通用的表示。然而，这些方法中的许多都存在收敛问题，并且只涉及在实验室条件下收集的数据集。在本文中，我们介绍了对抗性判别域泛化（ADDoG），它采用了一种更易于训练的“中间相遇”方法。该模型迭代地使为每个数据集学习的表示彼此更接近，从而提高跨数据集泛化能力。我们还介绍了多类ADDoG，即MADDoG，它能够同时将所提出的方法扩展到两个以上的数据集。我们的结果表明，所介绍的方法具有一致的收敛性，在不使用目标数据集标签时结果有显著改善。我们还展示了在大多数情况下，当添加目标数据集标签并考虑自然数据时，ADDoG和MADDoG如何用于改进基线的最先进方法。尽管我们的实验集中在跨语料库语音情感上，但这些方法可用于消除其他环境中不需要的变化因素。

相似文献

Improving Cross-Corpus Speech Emotion Recognition with Adversarial Discriminative Domain Generalization (ADDoG).利用对抗性判别域泛化（ADDoG）改进跨语料库语音情感识别

IEEE Trans Affect Comput. 2021 Oct-Dec;12(4):1055-1068. doi: 10.1109/taffc.2019.2916092. Epub 2019 May 14.

Progressively Discriminative Transfer Network for Cross-Corpus Speech Emotion Recognition.用于跨语料库语音情感识别的渐进式判别转移网络

Entropy (Basel). 2022 Jul 29;24(8):1046. doi: 10.3390/e24081046.

Fusing Visual Attention CNN and Bag of Visual Words for Cross-Corpus Speech Emotion Recognition.融合视觉注意 CNN 和视觉词袋用于跨语料库语音情感识别。

Sensors (Basel). 2020 Sep 28;20(19):5559. doi: 10.3390/s20195559.

Multiscale unsupervised domain adaptation for automatic pancreas segmentation in CT volumes using adversarial learning.基于对抗学习的 CT 容积中多尺度无监督域自适应自动胰腺分割。

Med Phys. 2022 Sep;49(9):5799-5818. doi: 10.1002/mp.15827. Epub 2022 Jul 27.

Non-Intrusive Speech Quality Assessment Based on Deep Neural Networks for Speech Communication.基于深度神经网络的语音通信非侵入式语音质量评估

IEEE Trans Neural Netw Learn Syst. 2025 Jan;36(1):174-187. doi: 10.1109/TNNLS.2023.3321076. Epub 2025 Jan 7.

Adapting Multiple Distributions for Bridging Emotions from Different Speech Corpora.适配多种分布以弥合不同语音语料库中的情感差异。

Entropy (Basel). 2022 Sep 5;24(9):1250. doi: 10.3390/e24091250.

Multi-Path and Group-Loss-Based Network for Speech Emotion Recognition in Multi-Domain Datasets.基于多路径和群组损失的网络在多领域数据集的语音情感识别。

Sensors (Basel). 2021 Feb 24;21(5):1579. doi: 10.3390/s21051579.

An adversarial discriminative temporal convolutional network for EEG-based cross-domain emotion recognition.一种用于基于脑电图的跨域情感识别的对抗性判别式时间卷积网络。

Comput Biol Med. 2022 Feb;141:105048. doi: 10.1016/j.compbiomed.2021.105048. Epub 2021 Nov 22.

Multi-Domain Adversarial Feature Generalization for Person Re-Identification.多领域对抗特征泛化的行人再识别

IEEE Trans Image Process. 2021;30:1596-1607. doi: 10.1109/TIP.2020.3046864. Epub 2021 Jan 11.

Strong Generalized Speech Emotion Recognition Based on Effective Data Augmentation.基于有效数据增强的强广义语音情感识别

Entropy (Basel). 2022 Dec 30;25(1):68. doi: 10.3390/e25010068.

引用本文的文献

Emotion Recognition in the Real-World: Passively Collecting and Estimating Emotions from Natural Speech Data of Individuals with Bipolar Disorder.现实世界中的情绪识别：从双相情感障碍患者的自然语音数据中被动收集和估计情绪

IEEE Trans Affect Comput. 2025 Jan-Mar;16(1):28-40. doi: 10.1109/taffc.2024.3407683. Epub 2024 May 30.

Enhancing depression recognition through a mixed expert model by integrating speaker-related and emotion-related features.通过整合与说话者相关和与情感相关的特征，利用混合专家模型提高抑郁症识别能力。

Sci Rep. 2025 Feb 3;15(1):4064. doi: 10.1038/s41598-025-88313-9.

MelTrans: Mel-Spectrogram Relationship-Learning for Speech Emotion Recognition via Transformers.基于 Transformer 的梅尔频谱关系学习在语音情感识别中的应用。

Sensors (Basel). 2024 Aug 25;24(17):5506. doi: 10.3390/s24175506.

An Emotion-Driven Vocal Biomarker-Based PTSD Screening Tool.一种基于情感驱动的嗓音生物标志物的创伤后应激障碍筛查工具。

IEEE Open J Eng Med Biol. 2023 Jun 13;5:621-626. doi: 10.1109/OJEMB.2023.3284798. eCollection 2024.

Mobile Acceptance and Commitment Therapy in Bipolar Disorder: Microrandomized Trial.双相情感障碍的移动接纳与承诺疗法：微随机试验

JMIR Ment Health. 2023 Apr 20;10:e43164. doi: 10.2196/43164.

Aligning Small Datasets Using Domain Adversarial Learning: Applications in Automated in Vivo Oral Cancer Diagnosis.使用领域对抗学习对齐小数据集：在自动体内口腔癌诊断中的应用。

IEEE J Biomed Health Inform. 2023 Jan;27(1):457-468. doi: 10.1109/JBHI.2022.3217015. Epub 2023 Jan 4.

Progressive distribution adapted neural networks for cross-corpus speech emotion recognition.用于跨语料库语音情感识别的渐进分布自适应神经网络。

Front Neurorobot. 2022 Sep 15;16:987146. doi: 10.3389/fnbot.2022.987146. eCollection 2022.

Adapting Multiple Distributions for Bridging Emotions from Different Speech Corpora.适配多种分布以弥合不同语音语料库中的情感差异。

Entropy (Basel). 2022 Sep 5;24(9):1250. doi: 10.3390/e24091250.

Progressively Discriminative Transfer Network for Cross-Corpus Speech Emotion Recognition.用于跨语料库语音情感识别的渐进式判别转移网络

Entropy (Basel). 2022 Jul 29;24(8):1046. doi: 10.3390/e24081046.

Deep Cross-Corpus Speech Emotion Recognition: Recent Advances and Perspectives.深度跨语料库语音情感识别：最新进展与展望

Front Neurorobot. 2021 Nov 29;15:784514. doi: 10.3389/fnbot.2021.784514. eCollection 2021.

本文引用的文献

ECOLOGICALLY VALID LONG-TERM MOOD MONITORING OF INDIVIDUALS WITH BIPOLAR DISORDER USING SPEECH.使用语音对双相情感障碍个体进行生态有效长期情绪监测。

Proc IEEE Int Conf Acoust Speech Signal Process. 2014 May;2014:4858-4862. doi: 10.1109/ICASSP.2014.6854525. Epub 2014 Jul 14.

MOOD STATE PREDICTION FROM SPEECH OF VARYING ACOUSTIC QUALITY FOR INDIVIDUALS WITH BIPOLAR DISORDER.基于不同声学质量语音对双相情感障碍个体的情绪状态预测

Proc IEEE Int Conf Acoust Speech Signal Process. 2016 Mar;2016:2359-2363. doi: 10.1109/ICASSP.2016.7472099.

Core affect and the psychological construction of emotion.核心情感与情绪的心理建构。

Psychol Rev. 2003 Jan;110(1):145-72. doi: 10.1037/0033-295x.110.1.145.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验