• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于集成技术 1D 卷积神经网络和注意力的实时语音情感识别的人机交互

Human-Computer Interaction with a Real-Time Speech Emotion Recognition with Ensembling Techniques 1D Convolution Neural Network and Attention.

机构信息

College of Engineering, Al Faisal University, P.O. Box 50927, Riyadh 11533, Saudi Arabia.

出版信息

Sensors (Basel). 2023 Jan 26;23(3):1386. doi: 10.3390/s23031386.

DOI:10.3390/s23031386
PMID:36772427
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9921095/
Abstract

Emotions have a crucial function in the mental existence of humans. They are vital for identifying a person's behaviour and mental condition. Speech Emotion Recognition (SER) is extracting a speaker's emotional state from their speech signal. SER is a growing discipline in human-computer interaction, and it has recently attracted more significant interest. This is because there are not so many universal emotions; therefore, any intelligent system with enough computational capacity can educate itself to recognise them. However, the issue is that human speech is immensely diverse, making it difficult to create a single, standardised recipe for detecting hidden emotions. This work attempted to solve this research difficulty by combining a multilingual emotional dataset with building a more generalised and effective model for recognising human emotions. A two-step process was used to develop the model. The first stage involved the extraction of features, and the second stage involved the classification of the features that were extracted. ZCR, RMSE, and the renowned MFC coefficients were retrieved as features. Two proposed models, 1D CNN combined with LSTM and attention and a proprietary 2D CNN architecture, were used for classification. The outcomes demonstrated that the suggested 1D CNN with LSTM and attention performed better than the 2D CNN. For the EMO-DB, SAVEE, ANAD, and BAVED datasets, the model's accuracy was 96.72%, 97.13%, 96.72%, and 88.39%, respectively. The model beat several earlier efforts on the same datasets, demonstrating the generality and efficacy of recognising multiple emotions from various languages.

摘要

情绪在人类的精神存在中起着至关重要的作用。它们对于识别一个人的行为和心理状态至关重要。语音情感识别(SER)是从语音信号中提取说话者的情感状态。SER 是人机交互领域的一个新兴学科,最近引起了更多的关注。这是因为没有那么多普遍的情感;因此,任何具有足够计算能力的智能系统都可以自我教育来识别它们。然而,问题是人类的语音是非常多样化的,因此很难创建一个单一的、标准化的方法来检测隐藏的情感。这项工作试图通过结合多语言情感数据集和构建一个更通用和有效的模型来解决这个研究难题,以识别人类的情感。该模型采用两步法开发。第一阶段涉及特征提取,第二阶段涉及提取特征的分类。ZCR、RMSE 和著名的 MFC 系数被提取为特征。使用两种提出的模型,即 1D CNN 与 LSTM 和注意力相结合和专有的 2D CNN 架构,进行分类。结果表明,建议的带有 LSTM 和注意力的 1D CNN 比 2D CNN 表现更好。对于 EMO-DB、SAVEE、ANAD 和 BAVED 数据集,模型的准确率分别为 96.72%、97.13%、96.72%和 88.39%。该模型在同一数据集上击败了几项早期的研究成果,证明了从多种语言识别多种情感的通用性和有效性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bad8/9921095/5a29d87f3563/sensors-23-01386-g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bad8/9921095/7bdb9d0b75cd/sensors-23-01386-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bad8/9921095/a38d73733d07/sensors-23-01386-g002a.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bad8/9921095/b62d344de763/sensors-23-01386-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bad8/9921095/f0c7470d1bce/sensors-23-01386-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bad8/9921095/b0e27a1d82a7/sensors-23-01386-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bad8/9921095/b178acdfe4cc/sensors-23-01386-g006a.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bad8/9921095/739d4c646e12/sensors-23-01386-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bad8/9921095/d502ed73d802/sensors-23-01386-g008a.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bad8/9921095/5a29d87f3563/sensors-23-01386-g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bad8/9921095/7bdb9d0b75cd/sensors-23-01386-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bad8/9921095/a38d73733d07/sensors-23-01386-g002a.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bad8/9921095/b62d344de763/sensors-23-01386-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bad8/9921095/f0c7470d1bce/sensors-23-01386-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bad8/9921095/b0e27a1d82a7/sensors-23-01386-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bad8/9921095/b178acdfe4cc/sensors-23-01386-g006a.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bad8/9921095/739d4c646e12/sensors-23-01386-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bad8/9921095/d502ed73d802/sensors-23-01386-g008a.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bad8/9921095/5a29d87f3563/sensors-23-01386-g009.jpg

相似文献

1
Human-Computer Interaction with a Real-Time Speech Emotion Recognition with Ensembling Techniques 1D Convolution Neural Network and Attention.基于集成技术 1D 卷积神经网络和注意力的实时语音情感识别的人机交互
Sensors (Basel). 2023 Jan 26;23(3):1386. doi: 10.3390/s23031386.
2
Human-Computer Interaction with Detection of Speaker Emotions Using Convolution Neural Networks.使用卷积神经网络检测说话人情绪的人机交互
Comput Intell Neurosci. 2022 Mar 31;2022:7463091. doi: 10.1155/2022/7463091. eCollection 2022.
3
Deep-Net: A Lightweight CNN-Based Speech Emotion Recognition System Using Deep Frequency Features.深度网络:基于深度学习频率特征的轻量级 CNN 语音情感识别系统
Sensors (Basel). 2020 Sep 12;20(18):5212. doi: 10.3390/s20185212.
4
The Impact of Attention Mechanisms on Speech Emotion Recognition.注意力机制对语音情感识别的影响。
Sensors (Basel). 2021 Nov 12;21(22):7530. doi: 10.3390/s21227530.
5
Impact of Feature Selection Algorithm on Speech Emotion Recognition Using Deep Convolutional Neural Network.基于深度卷积神经网络的特征选择算法对语音情感识别的影响。
Sensors (Basel). 2020 Oct 23;20(21):6008. doi: 10.3390/s20216008.
6
A comprehensive study on bilingual and multilingual speech emotion recognition using a two-pass classification scheme.使用双通分类方案进行双语和多语语音情感识别的综合研究。
PLoS One. 2019 Aug 15;14(8):e0220386. doi: 10.1371/journal.pone.0220386. eCollection 2019.
7
Fusing Visual Attention CNN and Bag of Visual Words for Cross-Corpus Speech Emotion Recognition.融合视觉注意 CNN 和视觉词袋用于跨语料库语音情感识别。
Sensors (Basel). 2020 Sep 28;20(19):5559. doi: 10.3390/s20195559.
8
Cascaded Convolutional Neural Network Architecture for Speech Emotion Recognition in Noisy Conditions.用于噪声环境下语音情感识别的级联卷积神经网络架构
Sensors (Basel). 2021 Jun 27;21(13):4399. doi: 10.3390/s21134399.
9
Detection of Emotion of Speech for RAVDESS Audio Using Hybrid Convolution Neural Network.使用混合卷积神经网络检测 RAVDESS 音频的语音情感。
J Healthc Eng. 2022 Feb 27;2022:8472947. doi: 10.1155/2022/8472947. eCollection 2022.
10
Combining a parallel 2D CNN with a self-attention Dilated Residual Network for CTC-based discrete speech emotion recognition.基于 CTC 的离散语音情感识别中,将二维并行卷积神经网络与自注意力空洞残差网络相结合。
Neural Netw. 2021 Sep;141:52-60. doi: 10.1016/j.neunet.2021.03.013. Epub 2021 Mar 23.

引用本文的文献

1
A Comprehensive Review of Multimodal Emotion Recognition: Techniques, Challenges, and Future Directions.多模态情感识别综述:技术、挑战与未来方向
Biomimetics (Basel). 2025 Jun 27;10(7):418. doi: 10.3390/biomimetics10070418.
2
An enhanced speech emotion recognition using vision transformer.基于视觉转换器的增强型语音情感识别。
Sci Rep. 2024 Jun 7;14(1):13126. doi: 10.1038/s41598-024-63776-4.

本文引用的文献

1
A Proposed Framework for Early Prediction of Schistosomiasis.血吸虫病早期预测的一个提议框架。
Diagnostics (Basel). 2022 Dec 12;12(12):3138. doi: 10.3390/diagnostics12123138.
2
Dementia Detection from Speech Using Machine Learning and Deep Learning Architectures.使用机器学习和深度学习架构进行语音痴呆检测。
Sensors (Basel). 2022 Nov 29;22(23):9311. doi: 10.3390/s22239311.
3
An Efficient Deep Learning-Based Skin Cancer Classifier for an Imbalanced Dataset.一种针对不平衡数据集的基于深度学习的高效皮肤癌分类器。
Diagnostics (Basel). 2022 Aug 31;12(9):2115. doi: 10.3390/diagnostics12092115.
4
A Hybrid Approach to Tea Crop Yield Prediction Using Simulation Models and Machine Learning.一种结合模拟模型和机器学习的茶叶作物产量预测混合方法。
Plants (Basel). 2022 Jul 25;11(15):1925. doi: 10.3390/plants11151925.
5
Computer-Aided Diagnosis of Coal Workers' Pneumoconiosis in Chest X-ray Radiographs Using Machine Learning: A Systematic Literature Review.基于机器学习的 X 射线胸片煤工尘肺计算机辅助诊断的系统文献综述。
Int J Environ Res Public Health. 2022 May 25;19(11):6439. doi: 10.3390/ijerph19116439.
6
Deep Transfer Learning Approaches in Performance Analysis of Brain Tumor Classification Using MRI Images.基于 MRI 图像的脑肿瘤分类性能分析中的深度迁移学习方法。
J Healthc Eng. 2022 Mar 8;2022:3264367. doi: 10.1155/2022/3264367. eCollection 2022.
7
Deep Cross-Corpus Speech Emotion Recognition: Recent Advances and Perspectives.深度跨语料库语音情感识别:最新进展与展望
Front Neurorobot. 2021 Nov 29;15:784514. doi: 10.3389/fnbot.2021.784514. eCollection 2021.
8
Combining a parallel 2D CNN with a self-attention Dilated Residual Network for CTC-based discrete speech emotion recognition.基于 CTC 的离散语音情感识别中,将二维并行卷积神经网络与自注意力空洞残差网络相结合。
Neural Netw. 2021 Sep;141:52-60. doi: 10.1016/j.neunet.2021.03.013. Epub 2021 Mar 23.
9
Impact of Feature Selection Algorithm on Speech Emotion Recognition Using Deep Convolutional Neural Network.基于深度卷积神经网络的特征选择算法对语音情感识别的影响。
Sensors (Basel). 2020 Oct 23;20(21):6008. doi: 10.3390/s20216008.
10
Fusing Visual Attention CNN and Bag of Visual Words for Cross-Corpus Speech Emotion Recognition.融合视觉注意 CNN 和视觉词袋用于跨语料库语音情感识别。
Sensors (Basel). 2020 Sep 28;20(19):5559. doi: 10.3390/s20195559.