基于图表示的语音情感识别。

Speech emotion recognition via graph-based representations.

机构信息

Institute of Computer Science, Foundation for Research and Technology-Hellas, Heraklion, GR-700 13, Greece.

Computer Science Department, University of Crete, Heraklion, GR-700 13, Greece.

出版信息

Sci Rep. 2024 Feb 23;14(1):4484. doi: 10.1038/s41598-024-52989-2.

DOI:10.1038/s41598-024-52989-2

PMID:38396002

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10891082/

Abstract

Speech emotion recognition (SER) has gained an increased interest during the last decades as part of enriched affective computing. As a consequence, a variety of engineering approaches have been developed addressing the challenge of the SER problem, exploiting different features, learning algorithms, and datasets. In this paper, we propose the application of the graph theory for classifying emotionally-colored speech signals. Graph theory provides tools for extracting statistical as well as structural information from any time series. We propose to use the mentioned information as a novel feature set. Furthermore, we suggest setting a unique feature-based identity for each emotion belonging to each speaker. The emotion classification is performed by a Random Forest classifier in a Leave-One-Speaker-Out Cross Validation (LOSO-CV) scheme. The proposed method is compared with two state-of-the-art approaches involving well known hand-crafted features as well as deep learning architectures operating on mel-spectrograms. Experimental results on three datasets, EMODB (German, acted) and AESDD (Greek, acted), and DEMoS (Italian, in-the-wild), reveal that our proposed method outperforms the comparative methods in these datasets. Specifically, we observe an average UAR increase of almost [Formula: see text], [Formula: see text] and [Formula: see text], respectively.

摘要

语音情感识别（SER）作为情感计算的一个分支，在过去几十年中引起了越来越多的关注。因此，已经开发了各种工程方法来解决 SER 问题的挑战，利用不同的特征、学习算法和数据集。在本文中，我们提出了将图论应用于分类情感色彩的语音信号。图论提供了从任何时间序列中提取统计和结构信息的工具。我们建议使用所述信息作为新的特征集。此外，我们建议为每个说话者的每种情感设置一个独特的基于特征的标识。通过在 Leave-One-Speaker-Out Cross Validation (LOSO-CV) 方案中使用随机森林分类器进行情感分类。将提出的方法与两种最先进的方法进行比较，这些方法涉及众所周知的手工制作特征以及在梅尔频谱图上运行的深度学习架构。在三个数据集 EMODB（德语，表演）、AESDD（希腊语，表演）和 DEMoS（意大利语，自然）上的实验结果表明，我们提出的方法在这些数据集上优于比较方法。具体来说，我们观察到平均 UAR 分别增加了近 [Formula: see text]、[Formula: see text] 和 [Formula: see text]。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

基于图表示的语音情感识别。

Speech emotion recognition via graph-based representations.

机构信息

出版信息

相似文献

本文引用的文献

基于图表示的语音情感识别。

Speech emotion recognition via graph-based representations.

机构信息

出版信息

相似文献

本文引用的文献