使用混合卷积神经网络检测 RAVDESS 音频的语音情感。

Detection of Emotion of Speech for RAVDESS Audio Using Hybrid Convolution Neural Network.

机构信息

ICT Ganpat University, Ahmedabad, Gujarat, India.

Computer Science and Engineering, Jagran Lakecity University, Bhopal, India.

出版信息

J Healthc Eng. 2022 Feb 27;2022:8472947. doi: 10.1155/2022/8472947. eCollection 2022.

DOI:10.1155/2022/8472947

PMID:35265307

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8898841/

Abstract

Every human being has emotion for every item related to them. For every customer, their emotion can help the customer representative to understand their requirement. So, speech emotion recognition plays an important role in the interaction between humans. Now, the intelligent system can help to improve the performance for which we design the convolution neural network (CNN) based network that can classify emotions in different categories like positive, negative, or more specific. In this paper, we use the Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS) audio records. The Log Mel Spectrogram and Mel-Frequency Cepstral Coefficients (MFCCs) were used to feature the raw audio file. These properties were used in the classification of emotions using techniques, such as Long Short-Term Memory (LSTM), CNNs, Hidden Markov models (HMMs), and Deep Neural Networks (DNNs). For this paper, we have divided the emotions into three sections for males and females. In the first section, we divide the emotion into two classes as positive. In the second section, we divide the emotion into three classes such as positive, negative, and neutral. In the third section, we divide the emotions into 8 different classes such as happy, sad, angry, fearful, surprise, disgust expressions, calm, and fearful emotions. For these three sections, we proposed the model which contains the eight consecutive layers of the 2D convolution neural method. The purposed model gives the better-performed categories to other previously given models. Now, we can identify the emotion of the consumer in better ways.

摘要

每个人都对与自己相关的物品有情感。对于每个客户来说，他们的情感可以帮助客户代表了解他们的需求。因此，语音情感识别在人机交互中起着重要的作用。现在，智能系统可以帮助提高性能，为此我们设计了基于卷积神经网络（CNN）的网络，该网络可以对积极、消极或更具体的情绪进行分类。在本文中，我们使用 Ryerson 情感语音和歌曲音频数据库（RAVDESS）的音频记录。对数梅尔频谱图和梅尔频率倒谱系数（MFCCs）用于对原始音频文件进行特征提取。这些特性被用于使用技术对情感进行分类，如长短期记忆（LSTM）、CNN、隐马尔可夫模型（HMMs）和深度神经网络（DNNs）。在本文中，我们将情感分为男性和女性的三个部分。在第一部分，我们将情感分为积极的两类。在第二部分，我们将情感分为积极、消极和中性三类。在第三部分，我们将情感分为 8 种不同的类别，如快乐、悲伤、愤怒、恐惧、惊讶、厌恶表情、平静和恐惧情绪。对于这三个部分，我们提出了一个包含 2D 卷积神经网络方法的 8 个连续层的模型。所提出的模型为其他之前的模型提供了更好的分类性能。现在，我们可以以更好的方式识别消费者的情绪。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

使用混合卷积神经网络检测 RAVDESS 音频的语音情感。

Detection of Emotion of Speech for RAVDESS Audio Using Hybrid Convolution Neural Network.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

使用混合卷积神经网络检测 RAVDESS 音频的语音情感。

Detection of Emotion of Speech for RAVDESS Audio Using Hybrid Convolution Neural Network.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献