• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于深度学习的语音情感识别的双向特征提取。

Two-Way Feature Extraction for Speech Emotion Recognition Using Deep Learning.

机构信息

Department of Computer Science Engineering & Information Technology, Jaypee Institute of Information Technology, A 10, Sector 62, Noida 201307, India.

School of Computer Science Engineering and Technology, Bennett University, Plot Nos 8-11, TechZone 2, Greater Noida 201310, India.

出版信息

Sensors (Basel). 2022 Mar 19;22(6):2378. doi: 10.3390/s22062378.

DOI:10.3390/s22062378
PMID:35336548
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8949356/
Abstract

Recognizing human emotions by machines is a complex task. Deep learning models attempt to automate this process by rendering machines to exhibit learning capabilities. However, identifying human emotions from speech with good performance is still challenging. With the advent of deep learning algorithms, this problem has been addressed recently. However, most research work in the past focused on feature extraction as only one method for training. In this research, we have explored two different methods of extracting features to address effective speech emotion recognition. Initially, two-way feature extraction is proposed by utilizing super convergence to extract two sets of potential features from the speech data. For the first set of features, principal component analysis (PCA) is applied to obtain the first feature set. Thereafter, a deep neural network (DNN) with dense and dropout layers is implemented. In the second approach, mel-spectrogram images are extracted from audio files, and the 2D images are given as input to the pre-trained VGG-16 model. Extensive experiments and an in-depth comparative analysis over both the feature extraction methods with multiple algorithms and over two datasets are performed in this work. The RAVDESS dataset provided significantly better accuracy than using numeric features on a DNN.

摘要

机器识别人类情感是一项复杂的任务。深度学习模型试图通过使机器表现出学习能力来自动化这个过程。然而,要从语音中识别出人类的情感并取得良好的性能仍然具有挑战性。随着深度学习算法的出现,这个问题最近得到了解决。然而,过去大多数研究工作都集中在特征提取上,只是作为训练的一种方法。在这项研究中,我们探索了两种不同的特征提取方法来解决有效的语音情感识别问题。最初,通过利用超收敛性,从语音数据中提取两组潜在特征,提出了双向特征提取。对于第一组特征,应用主成分分析 (PCA) 以获得第一组特征集。然后,实现了具有密集和辍学层的深度神经网络 (DNN)。在第二种方法中,从音频文件中提取梅尔频谱图图像,并将二维图像作为输入提供给预先训练的 VGG-16 模型。在这项工作中,对这两种特征提取方法进行了广泛的实验和深入的比较分析,并使用多个算法和两个数据集进行了比较。RAVDESS 数据集在 DNN 上使用数字特征时提供了显著更好的准确性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3c57/8949356/632da05ab966/sensors-22-02378-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3c57/8949356/b01d6027e6cc/sensors-22-02378-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3c57/8949356/11f5cf1b991c/sensors-22-02378-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3c57/8949356/7cbbe257b42c/sensors-22-02378-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3c57/8949356/13a0e9fc4f57/sensors-22-02378-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3c57/8949356/11175d560fbe/sensors-22-02378-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3c57/8949356/632da05ab966/sensors-22-02378-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3c57/8949356/b01d6027e6cc/sensors-22-02378-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3c57/8949356/11f5cf1b991c/sensors-22-02378-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3c57/8949356/7cbbe257b42c/sensors-22-02378-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3c57/8949356/13a0e9fc4f57/sensors-22-02378-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3c57/8949356/11175d560fbe/sensors-22-02378-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3c57/8949356/632da05ab966/sensors-22-02378-g006.jpg

相似文献

1
Two-Way Feature Extraction for Speech Emotion Recognition Using Deep Learning.基于深度学习的语音情感识别的双向特征提取。
Sensors (Basel). 2022 Mar 19;22(6):2378. doi: 10.3390/s22062378.
2
Impact of Feature Selection Algorithm on Speech Emotion Recognition Using Deep Convolutional Neural Network.基于深度卷积神经网络的特征选择算法对语音情感识别的影响。
Sensors (Basel). 2020 Oct 23;20(21):6008. doi: 10.3390/s20216008.
3
Speech emotion recognition using machine learning techniques: Feature extraction and comparison of convolutional neural network and random forest.基于机器学习技术的语音情感识别:卷积神经网络和随机森林的特征提取与比较。
PLoS One. 2023 Nov 21;18(11):e0291500. doi: 10.1371/journal.pone.0291500. eCollection 2023.
4
A comprehensive study on bilingual and multilingual speech emotion recognition using a two-pass classification scheme.使用双通分类方案进行双语和多语语音情感识别的综合研究。
PLoS One. 2019 Aug 15;14(8):e0220386. doi: 10.1371/journal.pone.0220386. eCollection 2019.
5
An enhanced speech emotion recognition using vision transformer.基于视觉转换器的增强型语音情感识别。
Sci Rep. 2024 Jun 7;14(1):13126. doi: 10.1038/s41598-024-63776-4.
6
Speech Emotion Recognition Using Convolution Neural Networks and Multi-Head Convolutional Transformer.基于卷积神经网络和多头卷积变换的语音情感识别。
Sensors (Basel). 2023 Jul 7;23(13):6212. doi: 10.3390/s23136212.
7
EEG-Based Multi-Modal Emotion Recognition using Bag of Deep Features: An Optimal Feature Selection Approach.基于 EEG 的多模态情绪识别的深度特征袋:一种最优特征选择方法。
Sensors (Basel). 2019 Nov 28;19(23):5218. doi: 10.3390/s19235218.
8
Effect on speech emotion classification of a feature selection approach using a convolutional neural network.使用卷积神经网络的特征选择方法对语音情感分类的影响。
PeerJ Comput Sci. 2021 Nov 3;7:e766. doi: 10.7717/peerj-cs.766. eCollection 2021.
9
Multi-Input Speech Emotion Recognition Model Using Mel Spectrogram and GeMAPS.基于梅尔频谱图和 GeMAPS 的多输入语音情感识别模型。
Sensors (Basel). 2023 Feb 3;23(3):1743. doi: 10.3390/s23031743.
10
IoT-Enabled WBAN and Machine Learning for Speech Emotion Recognition in Patients.物联网支持的 WBAN 和机器学习在患者语音情感识别中的应用。
Sensors (Basel). 2023 Mar 8;23(6):2948. doi: 10.3390/s23062948.

引用本文的文献

1
Speech emotion recognition based on a stacked autoencoders optimized by PSO based grass fibrous root optimization.基于粒子群优化的草纤维根优化算法优化的堆叠自动编码器的语音情感识别
Sci Rep. 2025 Jul 18;15(1):26158. doi: 10.1038/s41598-025-08703-x.
2
MS-EmoBoost: a novel strategy for enhancing self-supervised speech emotion representations.MS-EmoBoost:一种增强自监督语音情感表征的新策略。
Sci Rep. 2025 Jul 1;15(1):21607. doi: 10.1038/s41598-025-94727-2.
3
Fusion of PCA and ICA in Statistical Subset Analysis for Speech Emotion Recognition.

本文引用的文献

1
AI-DRIVEN Novel Approach for Liver Cancer Screening and Prediction Using Cascaded Fully Convolutional Neural Network.基于级联全卷积神经网络的人工智能肝癌筛查和预测新方法。
J Healthc Eng. 2022 Feb 1;2022:4277436. doi: 10.1155/2022/4277436. eCollection 2022.
2
A Novel Diabetes Healthcare Disease Prediction Framework Using Machine Learning Techniques.一种使用机器学习技术的新型糖尿病医疗保健疾病预测框架。
J Healthc Eng. 2022 Jan 11;2022:1684017. doi: 10.1155/2022/1684017. eCollection 2022.
3
Efficient prediction of drug-drug interaction using deep learning models.
主成分分析和独立成分分析在语音情感识别统计子集分析中的融合。
Sensors (Basel). 2024 Sep 2;24(17):5704. doi: 10.3390/s24175704.
4
Enhancing Speech Emotion Recognition Using Dual Feature Extraction Encoders.利用双特征提取编码器增强语音情感识别。
Sensors (Basel). 2023 Jul 24;23(14):6640. doi: 10.3390/s23146640.
5
Speech Emotion Recognition Using Attention Model.基于注意力模型的语音情感识别
Int J Environ Res Public Health. 2023 Mar 14;20(6):5140. doi: 10.3390/ijerph20065140.
6
Emotion Detection Based on Pupil Variation.基于瞳孔变化的情绪检测
Healthcare (Basel). 2023 Jan 21;11(3):322. doi: 10.3390/healthcare11030322.
7
Improved Feature Parameter Extraction from Speech Signals Using Machine Learning Algorithm.基于机器学习算法的语音信号特征参数提取改进。
Sensors (Basel). 2022 Oct 24;22(21):8122. doi: 10.3390/s22218122.
利用深度学习模型实现药物-药物相互作用的高效预测。
IET Syst Biol. 2020 Aug;14(4):211-216. doi: 10.1049/iet-syb.2019.0116.
4
Evaluating deep learning architectures for Speech Emotion Recognition.评估用于语音情感识别的深度学习架构。
Neural Netw. 2017 Aug;92:60-68. doi: 10.1016/j.neunet.2017.02.013. Epub 2017 Mar 21.