• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于语音的稳健多场景情绪识别系统。

Robust Multi-Scenario Speech-Based Emotion Recognition System.

机构信息

Department of Signal Theory and Communications, University of Alcalá, 28805 Alcalá de Henares, Madrid, Spain.

出版信息

Sensors (Basel). 2022 Mar 18;22(6):2343. doi: 10.3390/s22062343.

DOI:10.3390/s22062343
PMID:35336515
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8953251/
Abstract

Every human being experiences emotions daily, e.g., joy, sadness, fear, anger. These might be revealed through speech-words are often accompanied by our emotional states when we talk. Different acoustic emotional databases are freely available for solving the Emotional Speech Recognition (ESR) task. Unfortunately, many of them were generated under non-real-world conditions, i.e., actors played emotions, and recorded emotions were under fictitious circumstances where noise is non-existent. Another weakness in the design of emotion recognition systems is the scarcity of enough patterns in the available databases, causing generalization problems and leading to overfitting. This paper examines how different recording environmental elements impact system performance using a simple logistic regression algorithm. Specifically, we conducted experiments simulating different scenarios, using different levels of Gaussian white noise, real-world noise, and reverberation. The results from this research show a performance deterioration in all scenarios, increasing the error probability from 25.57% to 79.13% in the worst case. Additionally, a virtual enlargement method and a robust multi-scenario speech-based emotion recognition system are proposed. Our system's average error probability of 34.57% is comparable to the best-case scenario with 31.55%. The findings support the prediction that simulated emotional speech databases do not offer sufficient closeness to real scenarios.

摘要

每个人每天都会体验情绪,例如喜悦、悲伤、恐惧、愤怒。这些情绪可能会通过言语表现出来,因为我们在说话时往往伴随着情绪状态。为了解决情感语音识别(ESR)任务,有许多免费的声学情感数据库可供使用。不幸的是,其中许多数据库是在非真实世界条件下生成的,例如演员表演情绪,录制的情绪是在不存在噪声的虚构情况下进行的。情感识别系统设计的另一个弱点是可用数据库中模式的稀缺性,这会导致泛化问题并导致过拟合。本文使用简单的逻辑回归算法研究了不同记录环境因素如何影响系统性能。具体来说,我们进行了实验,模拟了不同的场景,使用了不同水平的高斯白噪声、真实世界的噪声和混响。这项研究的结果表明,所有场景的性能都有所下降,在最坏的情况下,错误概率从 25.57%增加到 79.13%。此外,还提出了一种虚拟放大方法和一种稳健的多场景基于语音的情感识别系统。我们系统的平均错误概率为 34.57%,与最佳情况的 31.55%相当。这些发现支持了这样的预测,即模拟情感语音数据库与真实场景的相似度不足。

相似文献

1
Robust Multi-Scenario Speech-Based Emotion Recognition System.基于语音的稳健多场景情绪识别系统。
Sensors (Basel). 2022 Mar 18;22(6):2343. doi: 10.3390/s22062343.
2
Intelligibility of emotional speech in younger and older adults.年轻人和老年人情感言语的可懂度。
Ear Hear. 2014 Nov-Dec;35(6):695-707. doi: 10.1097/AUD.0000000000000082.
3
A 5-emotions stimuli set for emotion perception research with full-body dance movements.一个用于情感感知研究的五情感刺激集,使用全身舞蹈动作。
Sci Rep. 2023 May 30;13(1):8757. doi: 10.1038/s41598-023-33656-4.
4
Emotions in [a]: a perceptual and acoustic study.关于[a]中的情感:一项感知与声学研究。
Logoped Phoniatr Vocol. 2006;31(1):43-8. doi: 10.1080/14015430500293926.
5
Recognition of emotions in Mexican Spanish speech: an approach based on acoustic modelling of emotion-specific vowels.墨西哥西班牙语语音中的情感识别:一种基于特定情感元音声学建模的方法。
ScientificWorldJournal. 2013 Jul 10;2013:162093. doi: 10.1155/2013/162093. Print 2013.
6
A New Network Structure for Speech Emotion Recognition Research.用于语音情感识别研究的新型网络结构。
Sensors (Basel). 2024 Feb 22;24(5):1429. doi: 10.3390/s24051429.
7
Perception and classification of emotions in nonsense speech: Humans versus machines.无意义语音中的情感感知与分类:人类与机器。
PLoS One. 2023 Jan 30;18(1):e0281079. doi: 10.1371/journal.pone.0281079. eCollection 2023.
8
[Perception of emotional intonation of noisy speech signal with different acoustic parameters by adults of different age and gender].[不同年龄和性别的成年人对具有不同声学参数的嘈杂语音信号情感语调的感知]
Zh Vyssh Nerv Deiat Im I P Pavlova. 2011 May-Jun;61(3):306-16.
9
Crossmodal and incremental perception of audiovisual cues to emotional speech.对情感语音视听线索的跨模态和递增感知。
Lang Speech. 2010;53(Pt 1):3-30. doi: 10.1177/0023830909348993.
10
Emotional Speech Recognition Using Deep Neural Networks.使用深度神经网络进行情感语音识别。
Sensors (Basel). 2022 Feb 12;22(4):1414. doi: 10.3390/s22041414.

引用本文的文献

1
Cross-modal gated feature enhancement for multimodal emotion recognition in conversations.用于对话中多模态情感识别的跨模态门控特征增强
Sci Rep. 2025 Aug 16;15(1):30004. doi: 10.1038/s41598-025-11989-6.
2
Predicting Treatment Outcomes in Patients with Low Back Pain Using Gene Signature-Based Machine Learning Models.使用基于基因特征的机器学习模型预测腰痛患者的治疗结果。
Pain Ther. 2025 Feb;14(1):359-373. doi: 10.1007/s40122-024-00700-8. Epub 2024 Dec 25.

本文引用的文献

1
A performance comparison of eight commercially available automatic classifiers for facial affect recognition.八种市售面部情感识别自动分类器的性能比较。
PLoS One. 2020 Apr 24;15(4):e0231968. doi: 10.1371/journal.pone.0231968. eCollection 2020.
2
Sound frequency affects speech emotion perception: results from congenital amusia.声音频率影响言语情感感知:来自先天性失歌症的研究结果。
Front Psychol. 2015 Sep 8;6:1340. doi: 10.3389/fpsyg.2015.01340. eCollection 2015.
3
CREMA-D: Crowd-sourced Emotional Multimodal Actors Dataset.CREMA-D:众包情感多模态演员数据集。
IEEE Trans Affect Comput. 2014 Oct-Dec;5(4):377-390. doi: 10.1109/TAFFC.2014.2336244.
4
Vocal indices of stress: a review.压力的发声指标:综述。
J Voice. 2013 May;27(3):390.e21-9. doi: 10.1016/j.jvoice.2012.12.010. Epub 2013 Feb 23.
5
Vocal affect expression: a review and a model for future research.嗓音情感表达:综述与未来研究模型
Psychol Bull. 1986 Mar;99(2):143-65.
6
Facial expressions of emotion: an old controversy and new findings.情绪的面部表情:一个古老的争议与新发现
Philos Trans R Soc Lond B Biol Sci. 1992 Jan 29;335(1273):63-9. doi: 10.1098/rstb.1992.0008.