• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

利用双特征提取编码器增强语音情感识别。

Enhancing Speech Emotion Recognition Using Dual Feature Extraction Encoders.

机构信息

Department of Computer Engineering, Gachon University, Seongnam 13120, Republic of Korea.

Department of Telecommunication Engineering, Nukus Branch of Tashkent University of Information Technologies Named after Muhammad Al-Khwarizmi, Nukus 230100, Uzbekistan.

出版信息

Sensors (Basel). 2023 Jul 24;23(14):6640. doi: 10.3390/s23146640.

DOI:10.3390/s23146640
PMID:37514933
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10383041/
Abstract

Understanding and identifying emotional cues in human speech is a crucial aspect of human-computer communication. The application of computer technology in dissecting and deciphering emotions, along with the extraction of relevant emotional characteristics from speech, forms a significant part of this process. The objective of this study was to architect an innovative framework for speech emotion recognition predicated on spectrograms and semantic feature transcribers, aiming to bolster performance precision by acknowledging the conspicuous inadequacies in extant methodologies and rectifying them. To procure invaluable attributes for speech detection, this investigation leveraged two divergent strategies. Primarily, a wholly convolutional neural network model was engaged to transcribe speech spectrograms. Subsequently, a cutting-edge Mel-frequency cepstral coefficient feature abstraction approach was adopted and integrated with Speech2Vec for semantic feature encoding. These dual forms of attributes underwent individual processing before they were channeled into a long short-term memory network and a comprehensive connected layer for supplementary representation. By doing so, we aimed to bolster the sophistication and efficacy of our speech emotion detection model, thereby enhancing its potential to accurately recognize and interpret emotion from human speech. The proposed mechanism underwent a rigorous evaluation process employing two distinct databases: RAVDESS and EMO-DB. The outcome displayed a predominant performance when juxtaposed with established models, registering an impressive accuracy of 94.8% on the RAVDESS dataset and a commendable 94.0% on the EMO-DB dataset. This superior performance underscores the efficacy of our innovative system in the realm of speech emotion recognition, as it outperforms current frameworks in accuracy metrics.

摘要

理解和识别人类言语中的情感线索是人机通信的关键方面。计算机技术在分析和破译情感以及从言语中提取相关情感特征方面的应用,是这个过程的重要组成部分。本研究的目的是构建一个基于语谱图和语义特征转录器的创新语音情感识别框架,旨在通过承认现有方法的明显不足并加以纠正来提高性能精度。为了为语音检测获取有价值的属性,本研究采用了两种不同的策略。首先,使用全卷积神经网络模型转录语音语谱图。其次,采用了一种先进的梅尔频率倒谱系数特征抽象方法,并与 Speech2Vec 结合用于语义特征编码。这些两种形式的属性在被输入到长短期记忆网络和综合连接层进行补充表示之前,都经过了单独的处理。通过这样做,我们旨在提高我们的语音情感检测模型的复杂性和有效性,从而增强其从人类语音中准确识别和解释情感的潜力。所提出的机制经过两个不同数据库(RAVDESS 和 EMO-DB)的严格评估过程。结果与现有模型进行比较时表现出色,在 RAVDESS 数据集上的准确率达到了 94.8%,在 EMO-DB 数据集上的准确率达到了 94.0%。这种卓越的性能突显了我们创新系统在语音情感识别领域的有效性,因为它在准确性指标上优于当前的框架。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e433/10383041/b41a556a8551/sensors-23-06640-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e433/10383041/b5c0c25f31ad/sensors-23-06640-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e433/10383041/66fe3e6a7689/sensors-23-06640-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e433/10383041/b41a556a8551/sensors-23-06640-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e433/10383041/b5c0c25f31ad/sensors-23-06640-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e433/10383041/66fe3e6a7689/sensors-23-06640-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e433/10383041/b41a556a8551/sensors-23-06640-g003.jpg

相似文献

1
Enhancing Speech Emotion Recognition Using Dual Feature Extraction Encoders.利用双特征提取编码器增强语音情感识别。
Sensors (Basel). 2023 Jul 24;23(14):6640. doi: 10.3390/s23146640.
2
A Hybrid Time-Distributed Deep Neural Architecture for Speech Emotion Recognition.一种用于语音情感识别的混合时间分布深度神经架构。
Int J Neural Syst. 2022 Jun;32(6):2250024. doi: 10.1142/S0129065722500241. Epub 2022 May 12.
3
Detection of Emotion of Speech for RAVDESS Audio Using Hybrid Convolution Neural Network.使用混合卷积神经网络检测 RAVDESS 音频的语音情感。
J Healthc Eng. 2022 Feb 27;2022:8472947. doi: 10.1155/2022/8472947. eCollection 2022.
4
A comprehensive study on bilingual and multilingual speech emotion recognition using a two-pass classification scheme.使用双通分类方案进行双语和多语语音情感识别的综合研究。
PLoS One. 2019 Aug 15;14(8):e0220386. doi: 10.1371/journal.pone.0220386. eCollection 2019.
5
IoT-Enabled WBAN and Machine Learning for Speech Emotion Recognition in Patients.物联网支持的 WBAN 和机器学习在患者语音情感识别中的应用。
Sensors (Basel). 2023 Mar 8;23(6):2948. doi: 10.3390/s23062948.
6
Impact of Feature Selection Algorithm on Speech Emotion Recognition Using Deep Convolutional Neural Network.基于深度卷积神经网络的特征选择算法对语音情感识别的影响。
Sensors (Basel). 2020 Oct 23;20(21):6008. doi: 10.3390/s20216008.
7
Fusing Visual Attention CNN and Bag of Visual Words for Cross-Corpus Speech Emotion Recognition.融合视觉注意 CNN 和视觉词袋用于跨语料库语音情感识别。
Sensors (Basel). 2020 Sep 28;20(19):5559. doi: 10.3390/s20195559.
8
Speech Emotion Recognition Using Convolution Neural Networks and Multi-Head Convolutional Transformer.基于卷积神经网络和多头卷积变换的语音情感识别。
Sensors (Basel). 2023 Jul 7;23(13):6212. doi: 10.3390/s23136212.
9
Fusing traditionally extracted features with deep learned features from the speech spectrogram for anger and stress detection using convolution neural network.将传统提取的特征与来自语音频谱图的深度学习特征相融合,用于使用卷积神经网络进行愤怒和压力检测。
Multimed Tools Appl. 2022;81(21):31107-31128. doi: 10.1007/s11042-022-12886-0. Epub 2022 Apr 8.
10
Effect on speech emotion classification of a feature selection approach using a convolutional neural network.使用卷积神经网络的特征选择方法对语音情感分类的影响。
PeerJ Comput Sci. 2021 Nov 3;7:e766. doi: 10.7717/peerj-cs.766. eCollection 2021.

本文引用的文献

1
Multimodal Emotion Detection via Attention-Based Fusion of Extracted Facial and Speech Features.基于提取的面部和语音特征的注意力融合的多模态情感检测。
Sensors (Basel). 2023 Jun 9;23(12):5475. doi: 10.3390/s23125475.
2
Multi-Input Speech Emotion Recognition Model Using Mel Spectrogram and GeMAPS.基于梅尔频谱图和 GeMAPS 的多输入语音情感识别模型。
Sensors (Basel). 2023 Feb 3;23(3):1743. doi: 10.3390/s23031743.
3
Dance emotion recognition based on linear predictive Meir frequency cepstrum coefficient and bidirectional long short-term memory from robot environment.
基于线性预测梅尔频率倒谱系数和来自机器人环境的双向长短期记忆的舞蹈情感识别
Front Neurorobot. 2022 Nov 11;16:1067729. doi: 10.3389/fnbot.2022.1067729. eCollection 2022.
4
Sentiment Analysis and Emotion Recognition from Speech Using Universal Speech Representations.基于通用语音表示的语音情感分析和情绪识别。
Sensors (Basel). 2022 Aug 24;22(17):6369. doi: 10.3390/s22176369.
5
A novel speech emotion recognition method based on feature construction and ensemble learning.基于特征构建和集成学习的新型语音情感识别方法。
PLoS One. 2022 Aug 15;17(8):e0267132. doi: 10.1371/journal.pone.0267132. eCollection 2022.
6
Two-Way Feature Extraction for Speech Emotion Recognition Using Deep Learning.基于深度学习的语音情感识别的双向特征提取。
Sensors (Basel). 2022 Mar 19;22(6):2378. doi: 10.3390/s22062378.
7
Utterance Level Feature Aggregation with Deep Metric Learning for Speech Emotion Recognition.基于深度度量学习的话语级特征聚合在语音情感识别中的研究
Sensors (Basel). 2021 Jun 20;21(12):4233. doi: 10.3390/s21124233.
8
Deep Spiking Neural Network for Video-Based Disguise Face Recognition Based on Dynamic Facial Movements.基于动态面部运动的基于视频的伪装人脸识别的深度尖峰神经网络。
IEEE Trans Neural Netw Learn Syst. 2020 Jun;31(6):1843-1855. doi: 10.1109/TNNLS.2019.2927274. Epub 2019 Jul 19.
9
Speech Emotion Recognition with Heterogeneous Feature Unification of Deep Neural Network.基于深度神经网络异构特征统一的语音情感识别
Sensors (Basel). 2019 Jun 18;19(12):2730. doi: 10.3390/s19122730.
10
Temporal and spatiotemporal investigation of tourist attraction visit sentiment on Twitter.基于 Twitter 的旅游景点访问情绪的时间和时空调查。
PLoS One. 2018 Jun 14;13(6):e0198857. doi: 10.1371/journal.pone.0198857. eCollection 2018.