Suppr超能文献

用于带噪语音情感识别的选择性声学特征增强

Selective Acoustic Feature Enhancement for Speech Emotion Recognition With Noisy Speech.

作者信息

Leem Seong-Gyun, Fulford Daniel, Onnela Jukka-Pekka, Gard David, Busso Carlos

机构信息

Department of Electrical and Computer Engineering, University of Texas at Dallas, Richardson, TX 75080 USA.

Occupational Therapy and Psychological and Brain Sciences, Boston University, Boston, MA 02215 USA.

出版信息

IEEE/ACM Trans Audio Speech Lang Process. 2024;32:917-929. doi: 10.1109/taslp.2023.3340603. Epub 2023 Dec 7.

Abstract

A (SER) system deployed on a real-world application can encounter speech contaminated with unconstrained background noise. To deal with this issue, a (SE) module can be attached to the SER system to compensate for the environmental difference of an input. Although the SE module can improve the quality and intelligibility of a given speech, there is a risk of affecting discriminative acoustic features for SER that are resilient to environmental differences. Exploring this idea, we propose to enhance only weak features that degrade the emotion recognition performance. Our model first identifies weak feature sets by using multiple models trained with one acoustic feature at a time using clean speech. After training the single-feature models, we rank each speech feature by measuring three criteria: performance, robustness, and a joint rank ranking that combines performance and robustness. We group the weak features by cumulatively incrementing the features from the bottom to the top of each rank. Once the weak feature set is defined, we only enhance those weak features, keeping the resilient features unchanged. We implement these ideas with the (LLDs). We show that directly enhancing the weak LLDs leads to better performance than extracting LLDs from an enhanced speech signal. Our experiment with clean and noisy versions of the MSP-Podcast corpus shows that the proposed approach yields a 17.7% (arousal), 21.2% (dominance), and 3.3% (valence) performance gains over a system that enhances all the LLDs for the 10dB (SNR) condition.

摘要

部署在实际应用中的情感识别(SER)系统可能会遇到被无约束背景噪声污染的语音。为了解决这个问题,可以在SER系统中附加一个环境补偿(SE)模块,以补偿输入的环境差异。尽管SE模块可以提高给定语音的质量和清晰度,但存在影响SER中对环境差异具有弹性的判别性声学特征的风险。基于这一想法,我们建议仅增强那些会降低情感识别性能的弱特征。我们的模型首先通过使用一次使用一种声学特征训练的多个模型,利用纯净语音来识别弱特征集。在训练单特征模型之后,我们通过测量三个标准对每个语音特征进行排序:性能、鲁棒性以及结合性能和鲁棒性的联合排名。我们通过从每个排名的底部到顶部累积增加特征来对弱特征进行分组。一旦定义了弱特征集,我们只增强那些弱特征,保持有弹性的特征不变。我们使用低层次描述符(LLDs)来实现这些想法。我们表明,直接增强弱LLDs比从增强后的语音信号中提取LLDs能带来更好的性能。我们对MSP-Podcast语料库的纯净版本和噪声版本进行的实验表明,在10dB信噪比条件下,与增强所有LLDs的系统相比,所提出的方法在唤醒度方面性能提升了17.7%,支配度方面提升了21.2%,效价方面提升了3.3%。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ad52/11250502/6ce5cd453889/nihms-1955419-f0006.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验