Gideon John, McInnis Melvin G, Provost Emily Mower
University of Michigan, Ann Arbor, MI, USA.
IEEE Trans Affect Comput. 2021 Oct-Dec;12(4):1055-1068. doi: 10.1109/taffc.2019.2916092. Epub 2019 May 14.
Automatic speech emotion recognition provides computers with critical context to enable user understanding. While methods trained and tested within the same dataset have been shown successful, they often fail when applied to unseen datasets. To address this, recent work has focused on adversarial methods to find more generalized representations of emotional speech. However, many of these methods have issues converging, and only involve datasets collected in laboratory conditions. In this paper, we introduce Adversarial Discriminative Domain Generalization (ADDoG), which follows an easier to train "meet in the middle" approach. The model iteratively moves representations learned for each dataset closer to one another, improving cross-dataset generalization. We also introduce Multiclass ADDoG, or MADDoG, which is able to extend the proposed method to more than two datasets, simultaneously. Our results show consistent convergence for the introduced methods, with significantly improved results when not using labels from the target dataset. We also show how, in most cases, ADDoG and MADDoG can be used to improve upon baseline state-of-the-art methods when target dataset labels are added and in-the-wild data are considered. Even though our experiments focus on cross-corpus speech emotion, these methods could be used to remove unwanted factors of variation in other settings.
自动语音情感识别为计算机提供关键背景信息,以实现用户理解。虽然在同一数据集内训练和测试的方法已被证明是成功的,但应用于未见数据集时往往会失败。为了解决这个问题,最近的工作集中在对抗方法上,以找到情感语音更通用的表示。然而,这些方法中的许多都存在收敛问题,并且只涉及在实验室条件下收集的数据集。在本文中,我们介绍了对抗性判别域泛化(ADDoG),它采用了一种更易于训练的“中间相遇”方法。该模型迭代地使为每个数据集学习的表示彼此更接近,从而提高跨数据集泛化能力。我们还介绍了多类ADDoG,即MADDoG,它能够同时将所提出的方法扩展到两个以上的数据集。我们的结果表明,所介绍的方法具有一致的收敛性,在不使用目标数据集标签时结果有显著改善。我们还展示了在大多数情况下,当添加目标数据集标签并考虑自然数据时,ADDoG和MADDoG如何用于改进基线的最先进方法。尽管我们的实验集中在跨语料库语音情感上,但这些方法可用于消除其他环境中不需要的变化因素。