Suppr超能文献

在用于抑郁症检测的多语音任务刺激中结合说话人嵌入的集成学习。

Ensemble learning with speaker embeddings in multiple speech task stimuli for depression detection.

作者信息

Liu Zhenyu, Yu Huimin, Li Gang, Chen Qiongqiong, Ding Zhijie, Feng Lei, Yao Zhijun, Hu Bin

机构信息

Gansu Provincial Key Laboratory of Wearable Computing, School of Information Science and Engineering, Lanzhou University, Lanzhou, China.

Tianshui Third People's Hospital, Tianshui, China.

出版信息

Front Neurosci. 2023 Mar 23;17:1141621. doi: 10.3389/fnins.2023.1141621. eCollection 2023.

Abstract

INTRODUCTION

As a biomarker of depression, speech signal has attracted the interest of many researchers due to its characteristics of easy collection and non-invasive. However, subjects' speech variation under different scenes and emotional stimuli, the insufficient amount of depression speech data for deep learning, and the variable length of speech frame-level features have an impact on the recognition performance.

METHODS

The above problems, this study proposes a multi-task ensemble learning method based on speaker embeddings for depression classification. First, we extract the Mel Frequency Cepstral Coefficients (MFCC), the Perceptual Linear Predictive Coefficients (PLP), and the Filter Bank (FBANK) from the out-domain dataset (CN-Celeb) and train the Resnet x-vector extractor, Time delay neural network (TDNN) x-vector extractor, and i-vector extractor. Then, we extract the corresponding speaker embeddings of fixed length from the depression speech database of the Gansu Provincial Key Laboratory of Wearable Computing. Support Vector Machine (SVM) and Random Forest (RF) are used to obtain the classification results of speaker embeddings in nine speech tasks. To make full use of the information of speech tasks with different scenes and emotions, we aggregate the classification results of nine tasks into new features and then obtain the final classification results by using Multilayer Perceptron (MLP). In order to take advantage of the complementary effects of different features, Resnet x-vectors based on different acoustic features are fused in the ensemble learning method.

RESULTS

Experimental results demonstrate that (1) MFCC-based Resnet x-vectors perform best among the nine speaker embeddings for depression detection; (2) interview speech is better than picture descriptions speech, and neutral stimulus is the best among the three emotional valences in the depression recognition task; (3) our multi-task ensemble learning method with MFCC-based Resnet x-vectors can effectively identify depressed patients; (4) in all cases, the combination of MFCC-based Resnet x-vectors and PLP-based Resnet x-vectors in our ensemble learning method achieves the best results, outperforming other literature studies using the depression speech database.

DISCUSSION

Our multi-task ensemble learning method with MFCC-based Resnet x-vectors can fuse the depression related information of different stimuli effectively, which provides a new approach for depression detection. The limitation of this method is that speaker embeddings extractors were pre-trained on the out-domain dataset. We will consider using the augmented in-domain dataset for pre-training to improve the depression recognition performance further.

摘要

引言

作为抑郁症的一种生物标志物,语音信号因其易于采集和非侵入性的特点,吸引了众多研究人员的关注。然而,受试者在不同场景和情绪刺激下的语音变化、深度学习所需的抑郁症语音数据量不足以及语音帧级特征的可变长度,都会对识别性能产生影响。

方法

针对上述问题,本研究提出了一种基于说话人嵌入的多任务集成学习方法用于抑郁症分类。首先,我们从域外数据集(CN-Celeb)中提取梅尔频率倒谱系数(MFCC)、感知线性预测系数(PLP)和滤波器组(FBANK),并训练残差网络x向量提取器、时延神经网络(TDNN)x向量提取器和i向量提取器。然后,我们从甘肃省可穿戴计算重点实验室的抑郁症语音数据库中提取固定长度的相应说话人嵌入。使用支持向量机(SVM)和随机森林(RF)来获得九个语音任务中说话人嵌入的分类结果。为了充分利用不同场景和情绪的语音任务信息,我们将九个任务的分类结果聚合为新特征,然后使用多层感知器(MLP)获得最终分类结果。为了利用不同特征的互补作用,在集成学习方法中融合了基于不同声学特征的残差网络x向量。

结果

实验结果表明:(1)基于MFCC的残差网络x向量在九个用于抑郁症检测的说话人嵌入中表现最佳;(2)访谈语音优于图片描述语音,在抑郁症识别任务中,中性刺激在三种情绪效价中表现最佳;(3)我们基于MFCC的残差网络x向量的多任务集成学习方法能够有效识别抑郁症患者;(4)在所有情况下,我们的集成学习方法中基于MFCC的残差网络x向量和基于PLP的残差网络x向量的组合取得了最佳效果,优于使用抑郁症语音数据库的其他文献研究。

讨论

我们基于MFCC的残差网络x向量的多任务集成学习方法能够有效融合不同刺激下与抑郁症相关的信息,为抑郁症检测提供了一种新方法。该方法的局限性在于说话人嵌入提取器是在域外数据集上进行预训练的。我们将考虑使用增强的域内数据集进行预训练,以进一步提高抑郁症识别性能。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9ff9/10076578/80e01b5956df/fnins-17-1141621-g001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验