Wen Guihua, Li Huihui, Huang Jubing, Li Danyang, Xun Eryang
School of Computer Science and Engineering, South China University of Technology, Guangzhou, China.
Comput Intell Neurosci. 2017;2017:1945630. doi: 10.1155/2017/1945630. Epub 2017 Mar 5.
Now the human emotions can be recognized from speech signals using machine learning methods; however, they are challenged by the lower recognition accuracies in real applications due to lack of the rich representation ability. Deep belief networks (DBN) can automatically discover the multiple levels of representations in speech signals. To make full of its advantages, this paper presents an ensemble of random deep belief networks (RDBN) method for speech emotion recognition. It firstly extracts the low level features of the input speech signal and then applies them to construct lots of random subspaces. Each random subspace is then provided for DBN to yield the higher level features as the input of the classifier to output an emotion label. All outputted emotion labels are then fused through the majority voting to decide the final emotion label for the input speech signal. The conducted experimental results on benchmark speech emotion databases show that RDBN has better accuracy than the compared methods for speech emotion recognition.
目前,可以使用机器学习方法从语音信号中识别人类情感;然而,由于缺乏丰富的表征能力,它们在实际应用中面临着较低识别准确率的挑战。深度信念网络(DBN)可以自动发现语音信号中的多层次表征。为了充分发挥其优势,本文提出了一种用于语音情感识别的随机深度信念网络(RDBN)集成方法。它首先提取输入语音信号的低级特征,然后将其应用于构建许多随机子空间。然后为每个随机子空间提供DBN,以产生更高级别的特征作为分类器的输入,从而输出情感标签。然后通过多数投票融合所有输出的情感标签,以确定输入语音信号的最终情感标签。在基准语音情感数据库上进行的实验结果表明,RDBN在语音情感识别方面比比较方法具有更高的准确率。