Suppr超能文献

基于 SVM 和 DBN 组合的智能情感服务中的汉语语音情感识别。

Emotion Recognition from Chinese Speech for Smart Affective Services Using a Combination of SVM and DBN.

机构信息

Department of Software Engineering, China University of Petroleum, No. 66 Changjiang West Road, Qingdao 266031, China.

Department of Information Processing Science, University of Oulu, Oulu FI-91004, Finland.

出版信息

Sensors (Basel). 2017 Jul 24;17(7):1694. doi: 10.3390/s17071694.

Abstract

Accurate emotion recognition from speech is important for applications like smart health care, smart entertainment, and other smart services. High accuracy emotion recognition from Chinese speech is challenging due to the complexities of the Chinese language. In this paper, we explore how to improve the accuracy of speech emotion recognition, including speech signal feature extraction and emotion classification methods. Five types of features are extracted from a speech sample: mel frequency cepstrum coefficient (MFCC), pitch, formant, short-term zero-crossing rate and short-term energy. By comparing statistical features with deep features extracted by a Deep Belief Network (DBN), we attempt to find the best features to identify the emotion status for speech. We propose a novel classification method that combines DBN and SVM (support vector machine) instead of using only one of them. In addition, a conjugate gradient method is applied to train DBN in order to speed up the training process. Gender-dependent experiments are conducted using an emotional speech database created by the Chinese Academy of Sciences. The results show that DBN features can reflect emotion status better than artificial features, and our new classification approach achieves an accuracy of 95.8%, which is higher than using either DBN or SVM separately. Results also show that DBN can work very well for small training databases if it is properly designed.

摘要

准确的语音情感识别对于智能医疗保健、智能娱乐和其他智能服务等应用非常重要。由于汉语语言的复杂性,从汉语语音中实现高精度的情感识别具有挑战性。在本文中,我们探讨了如何提高语音情感识别的准确性,包括语音信号特征提取和情感分类方法。从语音样本中提取了五种类型的特征:梅尔频率倒谱系数 (MFCC)、音高、共振峰、短时过零率和短时能量。通过比较统计特征和由深度置信网络 (DBN) 提取的深度特征,我们试图找到最佳的特征来识别语音的情感状态。我们提出了一种新的分类方法,将 DBN 和 SVM(支持向量机)结合起来,而不是只使用其中之一。此外,应用共轭梯度方法来训练 DBN,以加快训练过程。使用中国科学院创建的情感语音数据库进行了性别相关的实验。结果表明,DBN 特征比人工特征更能反映情感状态,我们的新分类方法的准确率达到 95.8%,高于单独使用 DBN 或 SVM。结果还表明,如果设计得当,DBN 可以在小型训练数据库中很好地工作。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验