基于高层描述符的人机交互中的情感识别。

Emotion recognition for human-computer interaction using high-level descriptors.

机构信息

Chitkara University Institute of Engineering and Technology, Chitkara University, Punjab, India.

Department of Computer Science, Multani Mal Modi College, Patiala, Punjab, India.

出版信息

Sci Rep. 2024 May 27;14(1):12122. doi: 10.1038/s41598-024-59294-y.

DOI:10.1038/s41598-024-59294-y

PMID:38802373

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11130182/

Abstract

Recent research has focused extensively on employing Deep Learning (DL) techniques, particularly Convolutional Neural Networks (CNN), for Speech Emotion Recognition (SER). This study addresses the burgeoning interest in leveraging DL for SER, specifically focusing on Punjabi language speakers. The paper presents a novel approach to constructing and preprocessing a labeled speech corpus using diverse social media sources. By utilizing spectrograms as the primary feature representation, the proposed algorithm effectively learns discriminative patterns for emotion recognition. The method is evaluated on a custom dataset derived from various Punjabi media sources, including films and web series. Results demonstrate that the proposed approach achieves an accuracy of 69%, surpassing traditional methods like decision trees, Naïve Bayes, and random forests, which achieved accuracies of 49%, 52%, and 61% respectively. Thus, the proposed method improves accuracy in recognizing emotions from Punjabi speech signals.

摘要

最近的研究广泛集中在运用深度学习（DL）技术，特别是卷积神经网络（CNN）进行语音情感识别（SER）。本研究探讨了利用 DL 进行 SER 的新兴兴趣，特别是针对旁遮普语使用者。本文提出了一种利用多种社交媒体源构建和预处理带标签语音语料库的新方法。通过使用声谱图作为主要特征表示，所提出的算法可以有效地学习用于情感识别的判别模式。该方法在源自各种旁遮普语媒体源的自定义数据集上进行评估，包括电影和网络系列。结果表明，所提出的方法的准确率为 69%，超过了传统方法，如决策树、朴素贝叶斯和随机森林，它们的准确率分别为 49%、52%和 61%。因此，该方法提高了识别旁遮普语语音信号中情感的准确性。