• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

构建一个抗性别偏见的超级语料库作为语音情感识别的深度学习基线。

Building a Gender-Bias-Resistant Super Corpus as a Deep Learning Baseline for Speech Emotion Recognition.

作者信息

Abbaschian Babak, Elmaghraby Adel

机构信息

Computer Science and Engineering Department, University of Louisville, Louisville, KY 40292, USA.

出版信息

Sensors (Basel). 2025 Mar 22;25(7):1991. doi: 10.3390/s25071991.

DOI:10.3390/s25071991
PMID:40218503
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11991078/
Abstract

The focus on Speech Emotion Recognition has dramatically increased in recent years, driven by the need for automatic speech-recognition-based systems and intelligent assistants to enhance user experience by incorporating emotional content. While deep learning techniques have significantly advanced SER systems, their robustness concerning speaker gender and out-of-distribution data has not been thoroughly examined. Furthermore, standards for SER remain rooted in landmark papers from the 2000s, even though modern deep learning architectures can achieve comparable or superior results to the state of the art of that era. In this research, we address these challenges by creating a new super corpus from existing databases, providing a larger pool of samples. We benchmark this dataset using various deep learning architectures, setting a new baseline for the task. Additionally, our experiments reveal that models trained on this super corpus demonstrate superior generalization and accuracy and exhibit lower gender bias compared to models trained on individual databases. We further show that traditional preprocessing techniques, such as denoising and normalization, are insufficient to address inherent biases in the data. However, our data augmentation approach effectively shifts these biases, improving model fairness across gender groups and emotions and, in some cases, fully debiasing the models.

摘要

近年来,由于基于自动语音识别的系统和智能助手需要通过融入情感内容来提升用户体验,对语音情感识别的关注显著增加。虽然深度学习技术极大地推动了语音情感识别系统的发展,但其在说话者性别和分布外数据方面的鲁棒性尚未得到充分研究。此外,语音情感识别的标准仍然基于21世纪初的标志性论文,尽管现代深度学习架构能够取得与那个时代的先进水平相当甚至更优的结果。在本研究中,我们通过从现有数据库创建一个新的超级语料库来应对这些挑战,该语料库提供了更大的样本池。我们使用各种深度学习架构对这个数据集进行基准测试,为该任务设定了一个新的基线。此外,我们的实验表明,与在单个数据库上训练的模型相比,在这个超级语料库上训练的模型表现出更好的泛化能力和准确性,并且性别偏差更低。我们进一步表明,传统的预处理技术,如去噪和归一化,不足以解决数据中的固有偏差。然而,我们的数据增强方法有效地改变了这些偏差,提高了模型在不同性别群体和情感之间的公平性,在某些情况下,还能完全消除模型的偏差。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b2b3/11991078/5a708c5246fd/sensors-25-01991-g010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b2b3/11991078/dc81eb1a1945/sensors-25-01991-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b2b3/11991078/de5a447537da/sensors-25-01991-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b2b3/11991078/15c108ae07cd/sensors-25-01991-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b2b3/11991078/d652aa9c4714/sensors-25-01991-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b2b3/11991078/5aee7cb20586/sensors-25-01991-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b2b3/11991078/bb57e794109f/sensors-25-01991-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b2b3/11991078/086f23789c66/sensors-25-01991-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b2b3/11991078/ec2937cfc3de/sensors-25-01991-g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b2b3/11991078/3df51aef1dfc/sensors-25-01991-g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b2b3/11991078/5a708c5246fd/sensors-25-01991-g010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b2b3/11991078/dc81eb1a1945/sensors-25-01991-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b2b3/11991078/de5a447537da/sensors-25-01991-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b2b3/11991078/15c108ae07cd/sensors-25-01991-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b2b3/11991078/d652aa9c4714/sensors-25-01991-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b2b3/11991078/5aee7cb20586/sensors-25-01991-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b2b3/11991078/bb57e794109f/sensors-25-01991-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b2b3/11991078/086f23789c66/sensors-25-01991-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b2b3/11991078/ec2937cfc3de/sensors-25-01991-g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b2b3/11991078/3df51aef1dfc/sensors-25-01991-g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b2b3/11991078/5a708c5246fd/sensors-25-01991-g010.jpg

相似文献

1
Building a Gender-Bias-Resistant Super Corpus as a Deep Learning Baseline for Speech Emotion Recognition.构建一个抗性别偏见的超级语料库作为语音情感识别的深度学习基线。
Sensors (Basel). 2025 Mar 22;25(7):1991. doi: 10.3390/s25071991.
2
Deep-Net: A Lightweight CNN-Based Speech Emotion Recognition System Using Deep Frequency Features.深度网络:基于深度学习频率特征的轻量级 CNN 语音情感识别系统
Sensors (Basel). 2020 Sep 12;20(18):5212. doi: 10.3390/s20185212.
3
Cross-corpus speech emotion recognition with transformers: Leveraging handcrafted features and data augmentation.基于 Transformer 的跨语料库语音情感识别:利用手工特征和数据增强。
Comput Biol Med. 2024 Sep;179:108841. doi: 10.1016/j.compbiomed.2024.108841. Epub 2024 Jul 12.
4
An enhanced speech emotion recognition using vision transformer.基于视觉转换器的增强型语音情感识别。
Sci Rep. 2024 Jun 7;14(1):13126. doi: 10.1038/s41598-024-63776-4.
5
Emotion recognition for human-computer interaction using high-level descriptors.基于高层描述符的人机交互中的情感识别。
Sci Rep. 2024 May 27;14(1):12122. doi: 10.1038/s41598-024-59294-y.
6
A comprehensive study on bilingual and multilingual speech emotion recognition using a two-pass classification scheme.使用双通分类方案进行双语和多语语音情感识别的综合研究。
PLoS One. 2019 Aug 15;14(8):e0220386. doi: 10.1371/journal.pone.0220386. eCollection 2019.
7
Fusion-ConvBERT: Parallel Convolution and BERT Fusion for Speech Emotion Recognition.融合卷积-BERT:语音情感识别的并行卷积和 BERT 融合。
Sensors (Basel). 2020 Nov 23;20(22):6688. doi: 10.3390/s20226688.
8
DSTCNet: Deep Spectro-Temporal-Channel Attention Network for Speech Emotion Recognition.DSTCNet:用于语音情感识别的深度频谱-时间-通道注意力网络
IEEE Trans Neural Netw Learn Syst. 2025 Jan;36(1):188-197. doi: 10.1109/TNNLS.2023.3304516. Epub 2025 Jan 7.
9
Evaluating deep learning architectures for Speech Emotion Recognition.评估用于语音情感识别的深度学习架构。
Neural Netw. 2017 Aug;92:60-68. doi: 10.1016/j.neunet.2017.02.013. Epub 2017 Mar 21.
10
MelTrans: Mel-Spectrogram Relationship-Learning for Speech Emotion Recognition via Transformers.基于 Transformer 的梅尔频谱关系学习在语音情感识别中的应用。
Sensors (Basel). 2024 Aug 25;24(17):5506. doi: 10.3390/s24175506.

本文引用的文献

1
Deep Learning Techniques for Speech Emotion Recognition, from Databases to Models.深度学习技术在语音情感识别中的应用,从数据库到模型。
Sensors (Basel). 2021 Feb 10;21(4):1249. doi: 10.3390/s21041249.
2
Multi-attention Recurrent Network for Human Communication Comprehension.用于人类交流理解的多注意力循环网络。
Proc AAAI Conf Artif Intell. 2018 Feb;2018:5642-5649.
3
The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS): A dynamic, multimodal set of facial and vocal expressions in North American English.
瑞尔森情感语音和歌曲音频视频数据库(RAVDESS):一组具有北美英语特色的动态、多模态面部和声音表情数据集。
PLoS One. 2018 May 16;13(5):e0196391. doi: 10.1371/journal.pone.0196391. eCollection 2018.
4
CREMA-D: Crowd-sourced Emotional Multimodal Actors Dataset.CREMA-D:众包情感多模态演员数据集。
IEEE Trans Affect Comput. 2014 Oct-Dec;5(4):377-390. doi: 10.1109/TAFFC.2014.2336244.
5
Gender recognition from speech. Part II: Fine analysis.语音性别识别。第二部分:精细分析。
J Acoust Soc Am. 1991 Oct;90(4 Pt 1):1841-56. doi: 10.1121/1.401664.