Fu Hongliang, Zhuang Zhihao, Wang Yang, Huang Chen, Duan Wenzhuo
College of Information Science and Engineering, Henan University of Technology, Zhengzhou 450001, China.
Henan Engineering Laboratory of Grain IOT Technology, Henan University of Technology, Zhengzhou 450001, China.
Entropy (Basel). 2023 Jan 7;25(1):124. doi: 10.3390/e25010124.
To solve the problem of feature distribution discrepancy in cross-corpus speech emotion recognition tasks, this paper proposed an emotion recognition model based on multi-task learning and subdomain adaptation, which alleviates the impact on emotion recognition. Existing methods have shortcomings in speech feature representation and cross-corpus feature distribution alignment. The proposed model uses a deep denoising auto-encoder as a shared feature extraction network for multi-task learning, and the fully connected layer and softmax layer are added before each recognition task as task-specific layers. Subsequently, the subdomain adaptation algorithm of emotion and gender features is added to the shared network to obtain the shared emotion features and gender features of the source domain and target domain, respectively. Multi-task learning effectively enhances the representation ability of features, a subdomain adaptive algorithm promotes the migrating ability of features and effectively alleviates the impact of feature distribution differences in emotional features. The average results of six cross-corpus speech emotion recognition experiments show that, compared with other models, the weighted average recall rate is increased by 1.89~10.07%, the experimental results verify the validity of the proposed model.
为解决跨语料库语音情感识别任务中的特征分布差异问题,本文提出了一种基于多任务学习和子域自适应的情感识别模型,该模型减轻了对情感识别的影响。现有方法在语音特征表示和跨语料库特征分布对齐方面存在不足。所提出的模型使用深度去噪自动编码器作为多任务学习的共享特征提取网络,并在每个识别任务之前添加全连接层和softmax层作为特定任务层。随后,将情感和性别特征的子域自适应算法添加到共享网络中,分别获得源域和目标域的共享情感特征和性别特征。多任务学习有效地增强了特征的表示能力,子域自适应算法促进了特征的迁移能力,并有效减轻了情感特征中特征分布差异的影响。六个跨语料库语音情感识别实验的平均结果表明,与其他模型相比,加权平均召回率提高了1.89%~10.07%,实验结果验证了所提模型的有效性。