• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于自动编码器的紧凑表示对音频情感检测的影响。

Impact of autoencoder based compact representation on emotion detection from audio.

作者信息

Patel Nivedita, Patel Shireen, Mankad Sapan H

机构信息

CSE Department, Institute of Technology, Nirma University, Ahmedabad, India.

出版信息

J Ambient Intell Humaniz Comput. 2022;13(2):867-885. doi: 10.1007/s12652-021-02979-3. Epub 2021 Mar 3.

DOI:10.1007/s12652-021-02979-3
PMID:33686349
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7927770/
Abstract

Emotion recognition from speech has its fair share of applications and consequently extensive research has been done over the past few years in this interesting field. However, many of the existing solutions aren't yet ready for real time applications. In this work, we propose a compact representation of audio using conventional autoencoders for dimensionality reduction, and test the approach on two benchmark publicly available datasets. Such compact and simple classification systems where the computing cost is low and memory is managed efficiently may be more useful for real time application. System is evaluated on the Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS) and the Toronto Emotional Speech Set (TESS). Three classifiers, namely, support vector machines (SVM), decision tree classifier, and convolutional neural networks (CNN) have been implemented to judge the impact of the approach. The results obtained by attempting classification with Alexnet and Resnet50 are also reported. Observations proved that this introduction of autoencoders indeed can improve the classification accuracy of the emotion in the input audio files. It can be concluded that in emotion recognition from speech, the choice and application of dimensionality reduction of audio features impacts the results that are achieved and therefore, by working on this aspect of the general speech emotion recognition model, it may be possible to make great improvements in the future.

摘要

语音情感识别有其相当数量的应用,因此在过去几年里,这个有趣的领域已经进行了广泛的研究。然而,许多现有的解决方案还未准备好用于实时应用。在这项工作中,我们提出使用传统自动编码器对音频进行紧凑表示以实现降维,并在两个公开可用的基准数据集上测试该方法。这种计算成本低且内存管理高效的紧凑而简单的分类系统可能对实时应用更有用。该系统在瑞尔森情感语音和歌曲视听数据库(RAVDESS)和多伦多情感语音集(TESS)上进行评估。已经实现了三种分类器,即支持向量机(SVM)、决策树分类器和卷积神经网络(CNN)来判断该方法的影响。还报告了使用Alexnet和Resnet50进行分类尝试所获得的结果。观察结果证明,引入自动编码器确实可以提高输入音频文件中情感的分类准确率。可以得出结论,在语音情感识别中,音频特征降维的选择和应用会影响所取得的结果,因此,通过在通用语音情感识别模型的这一方面开展工作,未来可能会有很大的改进。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c70b/7927770/b7279e46656e/12652_2021_2979_Fig11_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c70b/7927770/7f95d8f324a7/12652_2021_2979_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c70b/7927770/40da28119293/12652_2021_2979_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c70b/7927770/1cdc0190640c/12652_2021_2979_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c70b/7927770/15b1c172f0bc/12652_2021_2979_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c70b/7927770/9ccc15877ea6/12652_2021_2979_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c70b/7927770/1800169da811/12652_2021_2979_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c70b/7927770/a4a85ed97013/12652_2021_2979_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c70b/7927770/51afba4b1600/12652_2021_2979_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c70b/7927770/793e2cdbc388/12652_2021_2979_Fig9_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c70b/7927770/9c9ca8f28819/12652_2021_2979_Fig10_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c70b/7927770/b7279e46656e/12652_2021_2979_Fig11_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c70b/7927770/7f95d8f324a7/12652_2021_2979_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c70b/7927770/40da28119293/12652_2021_2979_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c70b/7927770/1cdc0190640c/12652_2021_2979_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c70b/7927770/15b1c172f0bc/12652_2021_2979_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c70b/7927770/9ccc15877ea6/12652_2021_2979_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c70b/7927770/1800169da811/12652_2021_2979_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c70b/7927770/a4a85ed97013/12652_2021_2979_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c70b/7927770/51afba4b1600/12652_2021_2979_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c70b/7927770/793e2cdbc388/12652_2021_2979_Fig9_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c70b/7927770/9c9ca8f28819/12652_2021_2979_Fig10_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c70b/7927770/b7279e46656e/12652_2021_2979_Fig11_HTML.jpg

相似文献

1
Impact of autoencoder based compact representation on emotion detection from audio.基于自动编码器的紧凑表示对音频情感检测的影响。
J Ambient Intell Humaniz Comput. 2022;13(2):867-885. doi: 10.1007/s12652-021-02979-3. Epub 2021 Mar 3.
2
Impact of Feature Selection Algorithm on Speech Emotion Recognition Using Deep Convolutional Neural Network.基于深度卷积神经网络的特征选择算法对语音情感识别的影响。
Sensors (Basel). 2020 Oct 23;20(21):6008. doi: 10.3390/s20216008.
3
Fusing traditionally extracted features with deep learned features from the speech spectrogram for anger and stress detection using convolution neural network.将传统提取的特征与来自语音频谱图的深度学习特征相融合,用于使用卷积神经网络进行愤怒和压力检测。
Multimed Tools Appl. 2022;81(21):31107-31128. doi: 10.1007/s11042-022-12886-0. Epub 2022 Apr 8.
4
Fusing Visual Attention CNN and Bag of Visual Words for Cross-Corpus Speech Emotion Recognition.融合视觉注意 CNN 和视觉词袋用于跨语料库语音情感识别。
Sensors (Basel). 2020 Sep 28;20(19):5559. doi: 10.3390/s20195559.
5
A CNN-Assisted Enhanced Audio Signal Processing for Speech Emotion Recognition.基于 CNN 的增强型音频信号处理在语音情感识别中的应用。
Sensors (Basel). 2019 Dec 28;20(1):183. doi: 10.3390/s20010183.
6
Detection of Emotion of Speech for RAVDESS Audio Using Hybrid Convolution Neural Network.使用混合卷积神经网络检测 RAVDESS 音频的语音情感。
J Healthc Eng. 2022 Feb 27;2022:8472947. doi: 10.1155/2022/8472947. eCollection 2022.
7
Effect on speech emotion classification of a feature selection approach using a convolutional neural network.使用卷积神经网络的特征选择方法对语音情感分类的影响。
PeerJ Comput Sci. 2021 Nov 3;7:e766. doi: 10.7717/peerj-cs.766. eCollection 2021.
8
Feature selection enhancement and feature space visualization for speech-based emotion recognition.基于语音的情感识别的特征选择增强与特征空间可视化
PeerJ Comput Sci. 2022 Nov 4;8:e1091. doi: 10.7717/peerj-cs.1091. eCollection 2022.
9
Utterance Level Feature Aggregation with Deep Metric Learning for Speech Emotion Recognition.基于深度度量学习的话语级特征聚合在语音情感识别中的研究
Sensors (Basel). 2021 Jun 20;21(12):4233. doi: 10.3390/s21124233.
10
Speech Emotion Recognition Using Attention Model.基于注意力模型的语音情感识别
Int J Environ Res Public Health. 2023 Mar 14;20(6):5140. doi: 10.3390/ijerph20065140.

引用本文的文献

1
Tongue Muscle Training App for Middle-Aged and Older Adults Incorporating Flow-Based Gameplay: Design and Feasibility Pilot Study.一款融入基于流程玩法的中老年舌肌训练应用程序:设计与可行性初步研究。
JMIR Serious Games. 2025 Jan 9;13:e53045. doi: 10.2196/53045.
2
A Classroom Emotion Recognition Model Based on a Convolutional Neural Network Speech Emotion Algorithm.基于卷积神经网络语音情感算法的课堂情感识别模型。
Occup Ther Int. 2022 Jul 7;2022:9563877. doi: 10.1155/2022/9563877. eCollection 2022.

本文引用的文献

1
Classification of COVID-19 in chest X-ray images using DeTraC deep convolutional neural network.使用DeTraC深度卷积神经网络对胸部X光图像中的新冠肺炎进行分类。
Appl Intell (Dordr). 2021;51(2):854-864. doi: 10.1007/s10489-020-01829-7. Epub 2020 Sep 5.
2
Automatic detection of coronavirus disease (COVID-19) using X-ray images and deep convolutional neural networks.使用X射线图像和深度卷积神经网络自动检测冠状病毒病(COVID-19)。
Pattern Anal Appl. 2021;24(3):1207-1220. doi: 10.1007/s10044-021-00984-y. Epub 2021 May 9.
3
COVID-Net: a tailored deep convolutional neural network design for detection of COVID-19 cases from chest X-ray images.
COVID-Net:一种针对胸部 X 光图像中 COVID-19 病例检测的定制化深度卷积神经网络设计。
Sci Rep. 2020 Nov 11;10(1):19549. doi: 10.1038/s41598-020-76550-z.
4
Automated assessment of psychiatric disorders using speech: A systematic review.使用语音对精神疾病进行自动评估:一项系统综述。
Laryngoscope Investig Otolaryngol. 2020 Jan 31;5(1):96-116. doi: 10.1002/lio2.354. eCollection 2020 Feb.
5
A CNN-Assisted Enhanced Audio Signal Processing for Speech Emotion Recognition.基于 CNN 的增强型音频信号处理在语音情感识别中的应用。
Sensors (Basel). 2019 Dec 28;20(1):183. doi: 10.3390/s20010183.
6
Articulation constrained learning with application to speech emotion recognition.应用于语音情感识别的关节约束学习
EURASIP J Audio Speech Music Process. 2019;2019(1). doi: 10.1186/s13636-019-0157-9. Epub 2019 Aug 20.