Suppr超能文献

利用双特征提取编码器增强语音情感识别。

Enhancing Speech Emotion Recognition Using Dual Feature Extraction Encoders.

机构信息

Department of Computer Engineering, Gachon University, Seongnam 13120, Republic of Korea.

Department of Telecommunication Engineering, Nukus Branch of Tashkent University of Information Technologies Named after Muhammad Al-Khwarizmi, Nukus 230100, Uzbekistan.

出版信息

Sensors (Basel). 2023 Jul 24;23(14):6640. doi: 10.3390/s23146640.

Abstract

Understanding and identifying emotional cues in human speech is a crucial aspect of human-computer communication. The application of computer technology in dissecting and deciphering emotions, along with the extraction of relevant emotional characteristics from speech, forms a significant part of this process. The objective of this study was to architect an innovative framework for speech emotion recognition predicated on spectrograms and semantic feature transcribers, aiming to bolster performance precision by acknowledging the conspicuous inadequacies in extant methodologies and rectifying them. To procure invaluable attributes for speech detection, this investigation leveraged two divergent strategies. Primarily, a wholly convolutional neural network model was engaged to transcribe speech spectrograms. Subsequently, a cutting-edge Mel-frequency cepstral coefficient feature abstraction approach was adopted and integrated with Speech2Vec for semantic feature encoding. These dual forms of attributes underwent individual processing before they were channeled into a long short-term memory network and a comprehensive connected layer for supplementary representation. By doing so, we aimed to bolster the sophistication and efficacy of our speech emotion detection model, thereby enhancing its potential to accurately recognize and interpret emotion from human speech. The proposed mechanism underwent a rigorous evaluation process employing two distinct databases: RAVDESS and EMO-DB. The outcome displayed a predominant performance when juxtaposed with established models, registering an impressive accuracy of 94.8% on the RAVDESS dataset and a commendable 94.0% on the EMO-DB dataset. This superior performance underscores the efficacy of our innovative system in the realm of speech emotion recognition, as it outperforms current frameworks in accuracy metrics.

摘要

理解和识别人类言语中的情感线索是人机通信的关键方面。计算机技术在分析和破译情感以及从言语中提取相关情感特征方面的应用,是这个过程的重要组成部分。本研究的目的是构建一个基于语谱图和语义特征转录器的创新语音情感识别框架,旨在通过承认现有方法的明显不足并加以纠正来提高性能精度。为了为语音检测获取有价值的属性,本研究采用了两种不同的策略。首先,使用全卷积神经网络模型转录语音语谱图。其次,采用了一种先进的梅尔频率倒谱系数特征抽象方法,并与 Speech2Vec 结合用于语义特征编码。这些两种形式的属性在被输入到长短期记忆网络和综合连接层进行补充表示之前,都经过了单独的处理。通过这样做,我们旨在提高我们的语音情感检测模型的复杂性和有效性,从而增强其从人类语音中准确识别和解释情感的潜力。所提出的机制经过两个不同数据库(RAVDESS 和 EMO-DB)的严格评估过程。结果与现有模型进行比较时表现出色,在 RAVDESS 数据集上的准确率达到了 94.8%,在 EMO-DB 数据集上的准确率达到了 94.0%。这种卓越的性能突显了我们创新系统在语音情感识别领域的有效性,因为它在准确性指标上优于当前的框架。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e433/10383041/b5c0c25f31ad/sensors-23-06640-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验