• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

语音情感识别中的语音属性和特征提取方法。

On the Speech Properties and Feature Extraction Methods in Speech Emotion Recognition.

机构信息

Institute of Multimedia Information and Communication Technologies, Faculty of Electrical Engineering and Information Technology, Slovak University of Technology in Bratislava, 2412 Bratislava, Slovakia.

Institute of Robotics and Cybernetics, Faculty of Electrical Engineering and Information Technology, Slovak University of Technology in Bratislava, 2412 Bratislava, Slovakia.

出版信息

Sensors (Basel). 2021 Mar 8;21(5):1888. doi: 10.3390/s21051888.

DOI:10.3390/s21051888
PMID:33800348
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7962835/
Abstract

Many speech emotion recognition systems have been designed using different features and classification methods. Still, there is a lack of knowledge and reasoning regarding the underlying speech characteristics and processing, i.e., how basic characteristics, methods, and settings affect the accuracy, to what extent, etc. This study is to extend physical perspective on speech emotion recognition by analyzing basic speech characteristics and modeling methods, e.g., time characteristics (segmentation, window types, and classification regions-lengths and overlaps), frequency ranges, frequency scales, processing of whole speech (spectrograms), vocal tract (filter banks, linear prediction coefficient (LPC) modeling), and excitation (inverse LPC filtering) signals, magnitude and phase manipulations, cepstral features, etc. In the evaluation phase the state-of-the-art classification method and rigorous statistical tests were applied, namely N-fold cross validation, paired -test, rank, and Pearson correlations. The results revealed several settings in a 75% accuracy range (seven emotions). The most successful methods were based on vocal tract features using psychoacoustic filter banks covering the 0-8 kHz frequency range. Well scoring are also spectrograms carrying vocal tract and excitation information. It was found that even basic processing like pre-emphasis, segmentation, magnitude modifications, etc., can dramatically affect the results. Most findings are robust by exhibiting strong correlations across tested databases.

摘要

许多语音情感识别系统都使用不同的特征和分类方法进行设计。但是,对于潜在的语音特征和处理,即基本特征、方法和设置如何影响准确性,影响程度如何等方面,仍然缺乏知识和推理。本研究通过分析基本的语音特征和建模方法,例如时间特征(分段、窗口类型和分类区域长度和重叠)、频率范围、频率标度、整体语音处理(频谱图)、声道(滤波器组、线性预测系数(LPC)建模)和激励(逆 LPC 滤波)信号、幅度和相位处理、倒谱特征等,来扩展语音情感识别的物理视角。在评估阶段,应用了最先进的分类方法和严格的统计检验,即 N 折交叉验证、配对检验、等级和 Pearson 相关。结果显示了在 75%准确率范围内的几种设置(七种情绪)。最成功的方法是基于声道特征,使用覆盖 0-8 kHz 频率范围的心理声学滤波器组。同时,携带声道和激励信息的频谱图也有很好的评分。结果发现,即使是预加重、分段、幅度修改等基本处理,也会对结果产生显著影响。大多数发现都是稳健的,在测试数据库中表现出很强的相关性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9c26/7962835/3a92e09d3d47/sensors-21-01888-g021.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9c26/7962835/80dc94fedb0e/sensors-21-01888-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9c26/7962835/82a465e7acac/sensors-21-01888-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9c26/7962835/cc322f79d679/sensors-21-01888-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9c26/7962835/564fd6e534ff/sensors-21-01888-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9c26/7962835/2b713621809d/sensors-21-01888-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9c26/7962835/e90c387bf9a8/sensors-21-01888-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9c26/7962835/e29192e32b52/sensors-21-01888-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9c26/7962835/66c45807157d/sensors-21-01888-g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9c26/7962835/2f1401993478/sensors-21-01888-g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9c26/7962835/e287cee53d49/sensors-21-01888-g010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9c26/7962835/6053e008e097/sensors-21-01888-g011.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9c26/7962835/aeb79415863c/sensors-21-01888-g012.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9c26/7962835/f28d6884d206/sensors-21-01888-g013.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9c26/7962835/963fc8ecf162/sensors-21-01888-g014.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9c26/7962835/e0d8a58f6c37/sensors-21-01888-g015.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9c26/7962835/705d3243a8c8/sensors-21-01888-g016.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9c26/7962835/1ac11031c85c/sensors-21-01888-g017.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9c26/7962835/baa6ba6e21d4/sensors-21-01888-g018.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9c26/7962835/b3fbc7a1334f/sensors-21-01888-g019.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9c26/7962835/4fa81844ea5a/sensors-21-01888-g020.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9c26/7962835/3a92e09d3d47/sensors-21-01888-g021.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9c26/7962835/80dc94fedb0e/sensors-21-01888-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9c26/7962835/82a465e7acac/sensors-21-01888-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9c26/7962835/cc322f79d679/sensors-21-01888-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9c26/7962835/564fd6e534ff/sensors-21-01888-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9c26/7962835/2b713621809d/sensors-21-01888-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9c26/7962835/e90c387bf9a8/sensors-21-01888-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9c26/7962835/e29192e32b52/sensors-21-01888-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9c26/7962835/66c45807157d/sensors-21-01888-g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9c26/7962835/2f1401993478/sensors-21-01888-g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9c26/7962835/e287cee53d49/sensors-21-01888-g010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9c26/7962835/6053e008e097/sensors-21-01888-g011.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9c26/7962835/aeb79415863c/sensors-21-01888-g012.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9c26/7962835/f28d6884d206/sensors-21-01888-g013.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9c26/7962835/963fc8ecf162/sensors-21-01888-g014.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9c26/7962835/e0d8a58f6c37/sensors-21-01888-g015.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9c26/7962835/705d3243a8c8/sensors-21-01888-g016.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9c26/7962835/1ac11031c85c/sensors-21-01888-g017.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9c26/7962835/baa6ba6e21d4/sensors-21-01888-g018.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9c26/7962835/b3fbc7a1334f/sensors-21-01888-g019.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9c26/7962835/4fa81844ea5a/sensors-21-01888-g020.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9c26/7962835/3a92e09d3d47/sensors-21-01888-g021.jpg

相似文献

1
On the Speech Properties and Feature Extraction Methods in Speech Emotion Recognition.语音情感识别中的语音属性和特征提取方法。
Sensors (Basel). 2021 Mar 8;21(5):1888. doi: 10.3390/s21051888.
2
Multiscale Amplitude Feature and Significance of Enhanced Vocal Tract Information for Emotion Classification.用于情感分类的声道信息增强的多尺度幅度特征及其意义
IEEE Trans Cybern. 2018 Jan 8. doi: 10.1109/TCYB.2017.2787717.
3
Enhancing Speech Emotion Recognition Using Dual Feature Extraction Encoders.利用双特征提取编码器增强语音情感识别。
Sensors (Basel). 2023 Jul 24;23(14):6640. doi: 10.3390/s23146640.
4
Emotion Recognition from Chinese Speech for Smart Affective Services Using a Combination of SVM and DBN.基于 SVM 和 DBN 组合的智能情感服务中的汉语语音情感识别。
Sensors (Basel). 2017 Jul 24;17(7):1694. doi: 10.3390/s17071694.
5
Detecting emotional valence using time-domain analysis of speech signals.使用语音信号的时域分析来检测情感效价。
Annu Int Conf IEEE Eng Med Biol Soc. 2019 Jul;2019:3605-3608. doi: 10.1109/EMBC.2019.8857691.
6
3D CNN-Based Speech Emotion Recognition Using K-Means Clustering and Spectrograms.基于3D卷积神经网络的语音情感识别:使用K均值聚类和频谱图
Entropy (Basel). 2019 May 8;21(5):479. doi: 10.3390/e21050479.
7
Time-frequency feature representation using multi-resolution texture analysis and acoustic activity detector for real-life speech emotion recognition.使用多分辨率纹理分析和声学活动检测器进行时频特征表示以实现现实生活中的语音情感识别。
Sensors (Basel). 2015 Jan 14;15(1):1458-78. doi: 10.3390/s150101458.
8
Stressed Speech Emotion Recognition Using Teager Energy and Spectral Feature Fusion with Feature Optimization.基于声门激励能量和频谱特征融合及特征优化的应激语音情感识别
Comput Intell Neurosci. 2023 Oct 11;2023:5765760. doi: 10.1155/2023/5765760. eCollection 2023.
9
Frequency, Time, Representation and Modeling Aspects for Major Speech and Audio Processing Applications.主要语音和音频处理应用的频率、时间、表示和建模方面。
Sensors (Basel). 2022 Aug 22;22(16):6304. doi: 10.3390/s22166304.
10
Intelligibility of emotional speech in younger and older adults.年轻人和老年人情感言语的可懂度。
Ear Hear. 2014 Nov-Dec;35(6):695-707. doi: 10.1097/AUD.0000000000000082.

引用本文的文献

1
Effectiveness of a Biofeedback Intervention Targeting Mental and Physical Health Among College Students Through Speech and Physiology as Biomarkers Using Machine Learning: A Randomized Controlled Trial.基于机器学习的言语和生理生物标志物的大学生身心反馈干预的效果:一项随机对照试验。
Appl Psychophysiol Biofeedback. 2024 Mar;49(1):71-83. doi: 10.1007/s10484-023-09612-3. Epub 2024 Jan 2.
2
End-to-End Model-Based Detection of Infants with Autism Spectrum Disorder Using a Pretrained Model.基于端到端模型的自闭症谱系障碍婴儿检测,使用预训练模型。
Sensors (Basel). 2022 Dec 25;23(1):202. doi: 10.3390/s23010202.
3
Speech Emotion Recognition Based on Modified ReliefF.

本文引用的文献

1
Deep-Net: A Lightweight CNN-Based Speech Emotion Recognition System Using Deep Frequency Features.深度网络:基于深度学习频率特征的轻量级 CNN 语音情感识别系统
Sensors (Basel). 2020 Sep 12;20(18):5212. doi: 10.3390/s20185212.
2
Speech Emotion Recognition with Heterogeneous Feature Unification of Deep Neural Network.基于深度神经网络异构特征统一的语音情感识别
Sensors (Basel). 2019 Jun 18;19(12):2730. doi: 10.3390/s19122730.
3
Mental status assessment of disaster relief personnel by vocal affect display based on voice emotion recognition.
基于改进 ReliefF 的语音情感识别。
Sensors (Basel). 2022 Oct 25;22(21):8152. doi: 10.3390/s22218152.
4
Improved Feature Parameter Extraction from Speech Signals Using Machine Learning Algorithm.基于机器学习算法的语音信号特征参数提取改进。
Sensors (Basel). 2022 Oct 24;22(21):8122. doi: 10.3390/s22218122.
5
Global and local feature fusion long and short-term memory mechanism for dance emotion recognition in robot.用于机器人舞蹈情感识别的全局与局部特征融合及长短时记忆机制
Front Neurorobot. 2022 Aug 24;16:998568. doi: 10.3389/fnbot.2022.998568. eCollection 2022.
6
Frequency, Time, Representation and Modeling Aspects for Major Speech and Audio Processing Applications.主要语音和音频处理应用的频率、时间、表示和建模方面。
Sensors (Basel). 2022 Aug 22;22(16):6304. doi: 10.3390/s22166304.
7
The Emotion Probe: On the Universality of Cross-Linguistic and Cross-Gender Speech Emotion Recognition via Machine Learning.情感探针:通过机器学习实现跨语言和跨性别言语情感识别的普遍性研究。
Sensors (Basel). 2022 Mar 23;22(7):2461. doi: 10.3390/s22072461.
8
Human-Computer Interaction with Detection of Speaker Emotions Using Convolution Neural Networks.使用卷积神经网络检测说话人情绪的人机交互
Comput Intell Neurosci. 2022 Mar 31;2022:7463091. doi: 10.1155/2022/7463091. eCollection 2022.
基于语音情感识别的救灾人员语音情感表露心理状态评估
Disaster Mil Med. 2017 Apr 8;3:4. doi: 10.1186/s40696-017-0032-0. eCollection 2017.
4
Evaluating deep learning architectures for Speech Emotion Recognition.评估用于语音情感识别的深度学习架构。
Neural Netw. 2017 Aug;92:60-68. doi: 10.1016/j.neunet.2017.02.013. Epub 2017 Mar 21.
5
Deep learning in neural networks: an overview.神经网络中的深度学习:综述。
Neural Netw. 2015 Jan;61:85-117. doi: 10.1016/j.neunet.2014.09.003. Epub 2014 Oct 13.
6
The circumplex model of affect: an integrative approach to affective neuroscience, cognitive development, and psychopathology.情感环状模型:情感神经科学、认知发展与精神病理学的综合研究方法。
Dev Psychopathol. 2005 Summer;17(3):715-34. doi: 10.1017/S0954579405050340.