Kapoor Shalini, Kumar Tarun
Research Scholar, Dr. A.P.J Abdul Kalam Technical University, Lucknow, India.
Department of Computer Science & Engineering, Radha Govind Group of Institution, Meerut, India.
Multimed Tools Appl. 2022;81(21):31107-31128. doi: 10.1007/s11042-022-12886-0. Epub 2022 Apr 8.
Stress and anger are two negative emotions that affect individuals both mentally and physically; there is a need to tackle them as soon as possible. Automated systems are highly required to monitor mental states and to detect early signs of emotional health issues. In the present work convolutional neural network is proposed for anger and stress detection using handcrafted features and deep learned features from the spectrogram. The objective of using a combined feature set is gathering information from two different representations of speech signals to obtain more prominent features and to boost the accuracy of recognition. The proposed method of emotion assessment is more computationally efficient than similar approaches used for emotion assessment. The preliminary results obtained on experimental evaluation of the proposed approach on three datasets Toronto Emotional Speech Set (TESS), Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS), and Berlin Emotional Database (EMO-DB) indicate that categorical accuracy is boosted and cross-entropy loss is reduced to a considerable extent. The proposed convolutional neural network (CNN) obtains training (T) and validation (V) categorical accuracy of T = 93.7%, V = 95.6% for TESS, T = 97.5%, V = 95.6% for EMO-DB and T = 96.7%, V = 96.7% for RAVDESS dataset.
压力和愤怒是两种会在精神和身体上影响个体的负面情绪;需要尽快应对它们。高度需要自动化系统来监测精神状态并检测情绪健康问题的早期迹象。在当前工作中,提出了一种卷积神经网络,用于使用手工特征和来自频谱图的深度学习特征来检测愤怒和压力。使用组合特征集的目的是从语音信号的两种不同表示中收集信息,以获得更突出的特征并提高识别准确率。所提出的情感评估方法在计算上比用于情感评估的类似方法更高效。在所提出的方法对三个数据集——多伦多情感语音集(TESS)、瑞尔森情感语音和歌曲视听数据库(RAVDESS)以及柏林情感数据库(EMO-DB)——进行实验评估时获得的初步结果表明,分类准确率得到了提高,交叉熵损失在很大程度上有所降低。所提出的卷积神经网络(CNN)在TESS数据集上获得的训练(T)和验证(V)分类准确率分别为T = 93.7%,V = 95.6%;在EMO-DB数据集上为T = 97.5%,V = 95.6%;在RAVDESS数据集上为T = 96.7%,V = 96.7%。