基于自动编码器的紧凑表示对音频情感检测的影响。

Impact of autoencoder based compact representation on emotion detection from audio.

作者信息

Patel Nivedita, Patel Shireen, Mankad Sapan H

机构信息

CSE Department, Institute of Technology, Nirma University, Ahmedabad, India.

出版信息

J Ambient Intell Humaniz Comput. 2022;13(2):867-885. doi: 10.1007/s12652-021-02979-3. Epub 2021 Mar 3.

DOI:10.1007/s12652-021-02979-3

PMID:33686349

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7927770/

Abstract

Emotion recognition from speech has its fair share of applications and consequently extensive research has been done over the past few years in this interesting field. However, many of the existing solutions aren't yet ready for real time applications. In this work, we propose a compact representation of audio using conventional autoencoders for dimensionality reduction, and test the approach on two benchmark publicly available datasets. Such compact and simple classification systems where the computing cost is low and memory is managed efficiently may be more useful for real time application. System is evaluated on the Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS) and the Toronto Emotional Speech Set (TESS). Three classifiers, namely, support vector machines (SVM), decision tree classifier, and convolutional neural networks (CNN) have been implemented to judge the impact of the approach. The results obtained by attempting classification with Alexnet and Resnet50 are also reported. Observations proved that this introduction of autoencoders indeed can improve the classification accuracy of the emotion in the input audio files. It can be concluded that in emotion recognition from speech, the choice and application of dimensionality reduction of audio features impacts the results that are achieved and therefore, by working on this aspect of the general speech emotion recognition model, it may be possible to make great improvements in the future.

摘要

语音情感识别有其相当数量的应用，因此在过去几年里，这个有趣的领域已经进行了广泛的研究。然而，许多现有的解决方案还未准备好用于实时应用。在这项工作中，我们提出使用传统自动编码器对音频进行紧凑表示以实现降维，并在两个公开可用的基准数据集上测试该方法。这种计算成本低且内存管理高效的紧凑而简单的分类系统可能对实时应用更有用。该系统在瑞尔森情感语音和歌曲视听数据库（RAVDESS）和多伦多情感语音集（TESS）上进行评估。已经实现了三种分类器，即支持向量机（SVM）、决策树分类器和卷积神经网络（CNN）来判断该方法的影响。还报告了使用Alexnet和Resnet50进行分类尝试所获得的结果。观察结果证明，引入自动编码器确实可以提高输入音频文件中情感的分类准确率。可以得出结论，在语音情感识别中，音频特征降维的选择和应用会影响所取得的结果，因此，通过在通用语音情感识别模型的这一方面开展工作，未来可能会有很大的改进。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

基于自动编码器的紧凑表示对音频情感检测的影响。

Impact of autoencoder based compact representation on emotion detection from audio.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

基于自动编码器的紧凑表示对音频情感检测的影响。

Impact of autoencoder based compact representation on emotion detection from audio.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献