• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于图表示的语音情感识别。

Speech emotion recognition via graph-based representations.

机构信息

Institute of Computer Science, Foundation for Research and Technology-Hellas, Heraklion, GR-700 13, Greece.

Computer Science Department, University of Crete, Heraklion, GR-700 13, Greece.

出版信息

Sci Rep. 2024 Feb 23;14(1):4484. doi: 10.1038/s41598-024-52989-2.

DOI:10.1038/s41598-024-52989-2
PMID:38396002
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10891082/
Abstract

Speech emotion recognition (SER) has gained an increased interest during the last decades as part of enriched affective computing. As a consequence, a variety of engineering approaches have been developed addressing the challenge of the SER problem, exploiting different features, learning algorithms, and datasets. In this paper, we propose the application of the graph theory for classifying emotionally-colored speech signals. Graph theory provides tools for extracting statistical as well as structural information from any time series. We propose to use the mentioned information as a novel feature set. Furthermore, we suggest setting a unique feature-based identity for each emotion belonging to each speaker. The emotion classification is performed by a Random Forest classifier in a Leave-One-Speaker-Out Cross Validation (LOSO-CV) scheme. The proposed method is compared with two state-of-the-art approaches involving well known hand-crafted features as well as deep learning architectures operating on mel-spectrograms. Experimental results on three datasets, EMODB (German, acted) and AESDD (Greek, acted), and DEMoS (Italian, in-the-wild), reveal that our proposed method outperforms the comparative methods in these datasets. Specifically, we observe an average UAR increase of almost [Formula: see text], [Formula: see text] and [Formula: see text], respectively.

摘要

语音情感识别(SER)作为情感计算的一个分支,在过去几十年中引起了越来越多的关注。因此,已经开发了各种工程方法来解决 SER 问题的挑战,利用不同的特征、学习算法和数据集。在本文中,我们提出了将图论应用于分类情感色彩的语音信号。图论提供了从任何时间序列中提取统计和结构信息的工具。我们建议使用所述信息作为新的特征集。此外,我们建议为每个说话者的每种情感设置一个独特的基于特征的标识。通过在 Leave-One-Speaker-Out Cross Validation (LOSO-CV) 方案中使用随机森林分类器进行情感分类。将提出的方法与两种最先进的方法进行比较,这些方法涉及众所周知的手工制作特征以及在梅尔频谱图上运行的深度学习架构。在三个数据集 EMODB(德语,表演)、AESDD(希腊语,表演)和 DEMoS(意大利语,自然)上的实验结果表明,我们提出的方法在这些数据集上优于比较方法。具体来说,我们观察到平均 UAR 分别增加了近 [Formula: see text]、[Formula: see text] 和 [Formula: see text]。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eede/10891082/dd40d6f42d3d/41598_2024_52989_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eede/10891082/d25964858113/41598_2024_52989_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eede/10891082/bf68c72d31c7/41598_2024_52989_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eede/10891082/dc7fa8d26d68/41598_2024_52989_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eede/10891082/cd927c04fc41/41598_2024_52989_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eede/10891082/dd40d6f42d3d/41598_2024_52989_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eede/10891082/d25964858113/41598_2024_52989_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eede/10891082/bf68c72d31c7/41598_2024_52989_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eede/10891082/dc7fa8d26d68/41598_2024_52989_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eede/10891082/cd927c04fc41/41598_2024_52989_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eede/10891082/dd40d6f42d3d/41598_2024_52989_Fig5_HTML.jpg

相似文献

1
Speech emotion recognition via graph-based representations.基于图表示的语音情感识别。
Sci Rep. 2024 Feb 23;14(1):4484. doi: 10.1038/s41598-024-52989-2.
2
Impact of Feature Selection Algorithm on Speech Emotion Recognition Using Deep Convolutional Neural Network.基于深度卷积神经网络的特征选择算法对语音情感识别的影响。
Sensors (Basel). 2020 Oct 23;20(21):6008. doi: 10.3390/s20216008.
3
An enhanced speech emotion recognition using vision transformer.基于视觉转换器的增强型语音情感识别。
Sci Rep. 2024 Jun 7;14(1):13126. doi: 10.1038/s41598-024-63776-4.
4
MelTrans: Mel-Spectrogram Relationship-Learning for Speech Emotion Recognition via Transformers.基于 Transformer 的梅尔频谱关系学习在语音情感识别中的应用。
Sensors (Basel). 2024 Aug 25;24(17):5506. doi: 10.3390/s24175506.
5
Effect on speech emotion classification of a feature selection approach using a convolutional neural network.使用卷积神经网络的特征选择方法对语音情感分类的影响。
PeerJ Comput Sci. 2021 Nov 3;7:e766. doi: 10.7717/peerj-cs.766. eCollection 2021.
6
Deep-Net: A Lightweight CNN-Based Speech Emotion Recognition System Using Deep Frequency Features.深度网络:基于深度学习频率特征的轻量级 CNN 语音情感识别系统
Sensors (Basel). 2020 Sep 12;20(18):5212. doi: 10.3390/s20185212.
7
A Hybrid Time-Distributed Deep Neural Architecture for Speech Emotion Recognition.一种用于语音情感识别的混合时间分布深度神经架构。
Int J Neural Syst. 2022 Jun;32(6):2250024. doi: 10.1142/S0129065722500241. Epub 2022 May 12.
8
A comprehensive study on bilingual and multilingual speech emotion recognition using a two-pass classification scheme.使用双通分类方案进行双语和多语语音情感识别的综合研究。
PLoS One. 2019 Aug 15;14(8):e0220386. doi: 10.1371/journal.pone.0220386. eCollection 2019.
9
Fusing Visual Attention CNN and Bag of Visual Words for Cross-Corpus Speech Emotion Recognition.融合视觉注意 CNN 和视觉词袋用于跨语料库语音情感识别。
Sensors (Basel). 2020 Sep 28;20(19):5559. doi: 10.3390/s20195559.
10
Evaluating deep learning architectures for Speech Emotion Recognition.评估用于语音情感识别的深度学习架构。
Neural Netw. 2017 Aug;92:60-68. doi: 10.1016/j.neunet.2017.02.013. Epub 2017 Mar 21.

本文引用的文献

1
Multimodal transformer augmented fusion for speech emotion recognition.用于语音情感识别的多模态变压器增强融合
Front Neurorobot. 2023 May 22;17:1181598. doi: 10.3389/fnbot.2023.1181598. eCollection 2023.
2
A Model of Normality Inspired Deep Learning Framework for Depression Relapse Prediction Using Audiovisual Data.基于深度学习的视听数据预测抑郁症复发的正常模式模型。
Comput Methods Programs Biomed. 2022 Nov;226:107132. doi: 10.1016/j.cmpb.2022.107132. Epub 2022 Sep 20.
3
End-to-end multimodal clinical depression recognition using deep neural networks: A comparative analysis.
端到端使用深度神经网络进行多模态临床抑郁症识别:比较分析。
Comput Methods Programs Biomed. 2021 Nov;211:106433. doi: 10.1016/j.cmpb.2021.106433. Epub 2021 Sep 28.
4
Graph-based feature extraction: A new proposal to study the classification of music signals outside the time-frequency domain.基于图的特征提取:一种研究时频域外音乐信号分类的新方法。
PLoS One. 2020 Nov 12;15(11):e0240915. doi: 10.1371/journal.pone.0240915. eCollection 2020.
5
A CNN-Assisted Enhanced Audio Signal Processing for Speech Emotion Recognition.基于 CNN 的增强型音频信号处理在语音情感识别中的应用。
Sensors (Basel). 2019 Dec 28;20(1):183. doi: 10.3390/s20010183.
6
GRAPH CONVOLUTIONAL NEURAL NETWORKS FOR ALZHEIMER'S DISEASE CLASSIFICATION.用于阿尔茨海默病分类的图卷积神经网络
Proc IEEE Int Symp Biomed Imaging. 2019 Apr;2019:414-417. doi: 10.1109/ISBI.2019.8759531. Epub 2019 Jul 11.
7
From time series to complex networks: the visibility graph.从时间序列到复杂网络:可见性图
Proc Natl Acad Sci U S A. 2008 Apr 1;105(13):4972-5. doi: 10.1073/pnas.0709247105. Epub 2008 Mar 24.