Suppr超能文献

通过排序实现的说话者敏感情感识别:关于表演性言语和自然言语的研究

Speaker-sensitive emotion recognition via ranking: Studies on acted and spontaneous speech.

作者信息

Cao Houwei, Verma Ragini, Nenkova Ani

机构信息

Department of Radiology, Section of Biomedical Image Analysis, University of Pennsylvania, 3600 Market Street, Suite 380, Philadelphia, PA 19104, United States.

Department of Computer and Information Science, University of Pennsylvania, 3330 Walnut Street, Philadelphia, PA 19104, United States.

出版信息

Comput Speech Lang. 2015 Jan;28(1):186-202. doi: 10.1016/j.csl.2014.01.003.

Abstract

We introduce a ranking approach for emotion recognition which naturally incorporates information about the general expressivity of speakers. We demonstrate that our approach leads to substantial gains in accuracy compared to conventional approaches. We train ranking SVMs for individual emotions, treating the data from each speaker as a separate query, and combine the predictions from all rankers to perform multi-class prediction. The ranking method provides two natural benefits. It captures speaker specific information even in speaker-independent training/testing conditions. It also incorporates the intuition that each utterance can express a mix of possible emotion and that considering the degree to which each emotion is expressed can be productively exploited to identify the dominant emotion. We compare the performance of the rankers and their combination to standard SVM classification approaches on two publicly available datasets of acted emotional speech, Berlin and LDC, as well as on spontaneous emotional data from the FAU Aibo dataset. On acted data, ranking approaches exhibit significantly better performance compared to SVM classification both in distinguishing a specific emotion from all others and in multi-class prediction. On the spontaneous data, which contains mostly neutral utterances with a relatively small portion of less intense emotional utterances, ranking-based classifiers again achieve much higher precision in identifying emotional utterances than conventional SVM classifiers. In addition, we discuss the complementarity of conventional SVM and ranking-based classifiers. On all three datasets we find dramatically higher accuracy for the test items on whose prediction the two methods agree compared to the accuracy of individual methods. Furthermore on the spontaneous data the ranking and standard classification are complementary and we obtain marked improvement when we combine the two classifiers by late-stage fusion.

摘要

我们介绍了一种用于情感识别的排序方法,该方法自然地融入了有关说话者总体表现力的信息。我们证明,与传统方法相比,我们的方法在准确性方面有显著提高。我们针对个体情感训练排序支持向量机(SVM),将每个说话者的数据视为单独的查询,并结合所有排序器的预测进行多类预测。排序方法有两个自然的优点。即使在与说话者无关的训练/测试条件下,它也能捕捉到说话者特定的信息。它还融入了这样一种直觉,即每个话语都可以表达多种可能的情感组合,并且考虑每种情感的表达程度可以有效地用于识别主导情感。我们在两个公开可用的表演情感语音数据集(柏林和LDC)以及来自FAU Aibo数据集的自发情感数据上,将排序器及其组合的性能与标准SVM分类方法进行了比较。在表演数据上,排序方法在区分特定情感与所有其他情感以及多类预测方面,与SVM分类相比都表现出显著更好的性能。在自发数据上,其中大部分是中性话语,只有相对较少部分的情感强度较低的话语,基于排序的分类器在识别情感话语方面再次比传统SVM分类器获得更高的精度。此外,我们讨论了传统SVM和基于排序的分类器的互补性。在所有三个数据集上,我们发现与单个方法的准确性相比,两种方法在预测上一致的测试项目的准确性显著更高。此外,在自发数据上,排序和标准分类是互补的,当我们通过后期融合将这两种分类器结合时,我们获得了显著的改进。

相似文献

1
Speaker-sensitive emotion recognition via ranking: Studies on acted and spontaneous speech.
Comput Speech Lang. 2015 Jan;28(1):186-202. doi: 10.1016/j.csl.2014.01.003.
3
A comprehensive study on bilingual and multilingual speech emotion recognition using a two-pass classification scheme.
PLoS One. 2019 Aug 15;14(8):e0220386. doi: 10.1371/journal.pone.0220386. eCollection 2019.
5
Ensemble learning with speaker embeddings in multiple speech task stimuli for depression detection.
Front Neurosci. 2023 Mar 23;17:1141621. doi: 10.3389/fnins.2023.1141621. eCollection 2023.
6
Effect on speech emotion classification of a feature selection approach using a convolutional neural network.
PeerJ Comput Sci. 2021 Nov 3;7:e766. doi: 10.7717/peerj-cs.766. eCollection 2021.
7
Class-Level Spectral Features for Emotion Recognition.
Speech Commun. 2010 Jul;52(7-8):613-625. doi: 10.1016/j.specom.2010.02.010.
8
Effects of Data Augmentations on Speech Emotion Recognition.
Sensors (Basel). 2022 Aug 9;22(16):5941. doi: 10.3390/s22165941.
9
Comparing Manual and Machine Annotations of Emotions in Non-acted Speech.
Annu Int Conf IEEE Eng Med Biol Soc. 2018 Jul;2018:4241-4244. doi: 10.1109/EMBC.2018.8513230.
10
Action Unit Models of Facial Expression of Emotion in the Presence of Speech.
Int Conf Affect Comput Intell Interact Workshops. 2013 Sep;2013:49-54. doi: 10.1109/ACII.2013.15.

引用本文的文献

1
Deep Cross-Corpus Speech Emotion Recognition: Recent Advances and Perspectives.
Front Neurorobot. 2021 Nov 29;15:784514. doi: 10.3389/fnbot.2021.784514. eCollection 2021.

本文引用的文献

1
Class-Level Spectral Features for Emotion Recognition.
Speech Commun. 2010 Jul;52(7-8):613-625. doi: 10.1016/j.specom.2010.02.010.
2
Supervised and Unsupervised Feature Selection for Inferring Social Nature of Telephone Conversations from Their Content.
Proc IEEE Workshop Autom Speech Recognit Underst. 2008 Apr 3;1:378-384. doi: 10.1109/ICCV.2003.1238369. Epub 2003 Oct 13.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验