Suppr超能文献

用于语音情感识别的双向并行回声状态网络。

Bidirectional parallel echo state network for speech emotion recognition.

作者信息

Ibrahim Hemin, Loo Chu Kiong, Alnajjar Fady

机构信息

Department of Artificial Intelligence, Faculty of Computer Science and Information Technology, Universiti Malaya, Kuala Lumpur, 50603 Malaysia.

College of Information Technology, UAE University, Al Ain, United Arab Emirates.

出版信息

Neural Comput Appl. 2022;34(20):17581-17599. doi: 10.1007/s00521-022-07410-2. Epub 2022 May 31.

Abstract

Speech is an effective way for communicating and exchanging complex information between humans. Speech signal has involved a great attention in human-computer interaction. Therefore, emotion recognition from speech has become a hot research topic in the field of interacting machines with humans. In this paper, we proposed a novel speech emotion recognition system by adopting multivariate time series handcrafted feature representation from speech signals. Bidirectional echo state network with two parallel reservoir layers has been applied to capture additional independent information. The parallel reservoirs produce multiple representations for each direction from the bidirectional data with two stages of concatenation. The sparse random projection approach has been adopted to reduce the high-dimensional sparse output for each direction separately from both reservoirs. Random over-sampling and random under-sampling methods are used to overcome the imbalanced nature of the used speech emotion datasets. The performance of the proposed parallel ESN model is evaluated from the speaker-independent experiments on EMO-DB, SAVEE, RAVDESS, and FAU Aibo datasets. The results show that the proposed SER model is superior to the single reservoir and the state-of-the-art studies.

摘要

语音是人类之间交流和交换复杂信息的有效方式。语音信号在人机交互中受到了极大关注。因此,从语音中进行情感识别已成为人机交互领域的一个热门研究课题。在本文中,我们通过采用来自语音信号的多元时间序列手工特征表示,提出了一种新颖的语音情感识别系统。具有两个并行储层的双向回声状态网络已被应用于捕获额外的独立信息。并行储层通过两个连接阶段为双向数据的每个方向生成多个表示。已采用稀疏随机投影方法分别从两个储层中减少每个方向的高维稀疏输出。随机过采样和随机欠采样方法用于克服所使用的语音情感数据集的不均衡特性。通过在EMO-DB、SAVEE、RAVDESS和FAU Aibo数据集上进行的独立于说话者的实验,对所提出的并行回声状态网络(ESN)模型的性能进行了评估。结果表明,所提出的语音情感识别(SER)模型优于单储层模型和现有最先进的研究。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c148/9152839/c6caed5f833e/521_2022_7410_Fig1_HTML.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验