• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

用于情感分类的声道信息增强的多尺度幅度特征及其意义

Multiscale Amplitude Feature and Significance of Enhanced Vocal Tract Information for Emotion Classification.

作者信息

Deb Suman, Dandapat Samarendra

出版信息

IEEE Trans Cybern. 2018 Jan 8. doi: 10.1109/TCYB.2017.2787717.

DOI:10.1109/TCYB.2017.2787717
PMID:29993975
Abstract

In this paper, a novel multiscale amplitude feature is proposed using multiresolution analysis (MRA) and the significance of the vocal tract is investigated for emotion classification from the speech signal. MRA decomposes the speech signal into number of sub-band signals. The proposed feature is computed by using sinusoidal model on each sub-band signal. Different emotions have different impacts on the vocal tract. As a result, vocal tract responds in a unique way for each emotion. The vocal tract information is enhanced using pre-emphasis. Therefore, emotion information manifested in the vocal tract can be well exploited. This may help in improving the performance of emotion classification. Emotion recognition is performed using German emotional EMODB database, interactive emotional dyadic motion capture database, simulated stressed speech database, and FAU AIBO database with speech signal and speech with enhanced vocal tract information (SEVTI). The performance of the proposed multiscale amplitude feature is compared with three different types of features: 1) the mel frequency cepstral coefficients; 2) the Teager energy operator (TEO)-based feature (TEO-CB-Auto-Env); and 3) the breathinesss feature. The proposed feature outperforms the other features. In terms of recognition rates, the features derived from the SEVTI signal, give better performance compared to the features derived from the speech signal. Combination of the features with SEVTI signal shows average recognition rate of 86.7% using EMODB database.

摘要

本文提出了一种利用多分辨率分析(MRA)的新型多尺度幅度特征,并从语音信号中研究了声道特征对情感分类的重要性。MRA将语音信号分解为多个子带信号。所提出的特征是通过在每个子带信号上使用正弦模型来计算的。不同的情感对声道有不同的影响。因此,声道对每种情感都有独特的响应方式。利用预加重增强声道信息。因此,可以很好地利用声道中表现出的情感信息。这可能有助于提高情感分类的性能。使用德国情感EMODB数据库、交互式情感二元运动捕捉数据库、模拟应激语音数据库以及带有语音信号和增强声道信息的语音(SEVTI)的FAU AIBO数据库进行情感识别。将所提出的多尺度幅度特征的性能与三种不同类型的特征进行比较:1)梅尔频率倒谱系数;2)基于Teager能量算子(TEO)的特征(TEO-CB-Auto-Env);3)呼吸特征。所提出的特征优于其他特征。在识别率方面,与从语音信号中提取的特征相比,从SEVTI信号中提取的特征具有更好的性能。使用EMODB数据库时,将这些特征与SEVTI信号相结合的平均识别率为86.7%。

相似文献

1
Multiscale Amplitude Feature and Significance of Enhanced Vocal Tract Information for Emotion Classification.用于情感分类的声道信息增强的多尺度幅度特征及其意义
IEEE Trans Cybern. 2018 Jan 8. doi: 10.1109/TCYB.2017.2787717.
2
Stressed Speech Emotion Recognition Using Teager Energy and Spectral Feature Fusion with Feature Optimization.基于声门激励能量和频谱特征融合及特征优化的应激语音情感识别
Comput Intell Neurosci. 2023 Oct 11;2023:5765760. doi: 10.1155/2023/5765760. eCollection 2023.
3
Time-frequency feature representation using multi-resolution texture analysis and acoustic activity detector for real-life speech emotion recognition.使用多分辨率纹理分析和声学活动检测器进行时频特征表示以实现现实生活中的语音情感识别。
Sensors (Basel). 2015 Jan 14;15(1):1458-78. doi: 10.3390/s150101458.
4
On the Speech Properties and Feature Extraction Methods in Speech Emotion Recognition.语音情感识别中的语音属性和特征提取方法。
Sensors (Basel). 2021 Mar 8;21(5):1888. doi: 10.3390/s21051888.
5
A comprehensive study on bilingual and multilingual speech emotion recognition using a two-pass classification scheme.使用双通分类方案进行双语和多语语音情感识别的综合研究。
PLoS One. 2019 Aug 15;14(8):e0220386. doi: 10.1371/journal.pone.0220386. eCollection 2019.
6
An enhanced speech emotion recognition using vision transformer.基于视觉转换器的增强型语音情感识别。
Sci Rep. 2024 Jun 7;14(1):13126. doi: 10.1038/s41598-024-63776-4.
7
Speech emotion recognition via graph-based representations.基于图表示的语音情感识别。
Sci Rep. 2024 Feb 23;14(1):4484. doi: 10.1038/s41598-024-52989-2.
8
Combining a parallel 2D CNN with a self-attention Dilated Residual Network for CTC-based discrete speech emotion recognition.基于 CTC 的离散语音情感识别中,将二维并行卷积神经网络与自注意力空洞残差网络相结合。
Neural Netw. 2021 Sep;141:52-60. doi: 10.1016/j.neunet.2021.03.013. Epub 2021 Mar 23.
9
Particle swarm optimization based feature enhancement and feature selection for improved emotion recognition in speech and glottal signals.基于粒子群优化的特征增强与特征选择,用于改进语音和声门信号中的情感识别
PLoS One. 2015 Mar 23;10(3):e0120344. doi: 10.1371/journal.pone.0120344. eCollection 2015.
10
Self-Relation Attention and Temporal Awareness for Emotion Recognition via Vocal Burst.基于声爆的自我关联注意和时间感知的情感识别。
Sensors (Basel). 2022 Dec 24;23(1):200. doi: 10.3390/s23010200.

引用本文的文献

1
A Hybrid Multimodal Emotion Recognition Framework for UX Evaluation Using Generalized Mixture Functions.基于广义混合函数的用于用户体验评估的混合多模态情感识别框架。
Sensors (Basel). 2023 Apr 28;23(9):4373. doi: 10.3390/s23094373.
2
Bidirectional parallel echo state network for speech emotion recognition.用于语音情感识别的双向并行回声状态网络。
Neural Comput Appl. 2022;34(20):17581-17599. doi: 10.1007/s00521-022-07410-2. Epub 2022 May 31.