Deb Suman, Dandapat Samarendra
IEEE Trans Cybern. 2018 Jan 8. doi: 10.1109/TCYB.2017.2787717.
In this paper, a novel multiscale amplitude feature is proposed using multiresolution analysis (MRA) and the significance of the vocal tract is investigated for emotion classification from the speech signal. MRA decomposes the speech signal into number of sub-band signals. The proposed feature is computed by using sinusoidal model on each sub-band signal. Different emotions have different impacts on the vocal tract. As a result, vocal tract responds in a unique way for each emotion. The vocal tract information is enhanced using pre-emphasis. Therefore, emotion information manifested in the vocal tract can be well exploited. This may help in improving the performance of emotion classification. Emotion recognition is performed using German emotional EMODB database, interactive emotional dyadic motion capture database, simulated stressed speech database, and FAU AIBO database with speech signal and speech with enhanced vocal tract information (SEVTI). The performance of the proposed multiscale amplitude feature is compared with three different types of features: 1) the mel frequency cepstral coefficients; 2) the Teager energy operator (TEO)-based feature (TEO-CB-Auto-Env); and 3) the breathinesss feature. The proposed feature outperforms the other features. In terms of recognition rates, the features derived from the SEVTI signal, give better performance compared to the features derived from the speech signal. Combination of the features with SEVTI signal shows average recognition rate of 86.7% using EMODB database.
本文提出了一种利用多分辨率分析(MRA)的新型多尺度幅度特征,并从语音信号中研究了声道特征对情感分类的重要性。MRA将语音信号分解为多个子带信号。所提出的特征是通过在每个子带信号上使用正弦模型来计算的。不同的情感对声道有不同的影响。因此,声道对每种情感都有独特的响应方式。利用预加重增强声道信息。因此,可以很好地利用声道中表现出的情感信息。这可能有助于提高情感分类的性能。使用德国情感EMODB数据库、交互式情感二元运动捕捉数据库、模拟应激语音数据库以及带有语音信号和增强声道信息的语音(SEVTI)的FAU AIBO数据库进行情感识别。将所提出的多尺度幅度特征的性能与三种不同类型的特征进行比较:1)梅尔频率倒谱系数;2)基于Teager能量算子(TEO)的特征(TEO-CB-Auto-Env);3)呼吸特征。所提出的特征优于其他特征。在识别率方面,与从语音信号中提取的特征相比,从SEVTI信号中提取的特征具有更好的性能。使用EMODB数据库时,将这些特征与SEVTI信号相结合的平均识别率为86.7%。