在用于抑郁症检测的多语音任务刺激中结合说话人嵌入的集成学习。

Ensemble learning with speaker embeddings in multiple speech task stimuli for depression detection.

作者信息

Liu Zhenyu, Yu Huimin, Li Gang, Chen Qiongqiong, Ding Zhijie, Feng Lei, Yao Zhijun, Hu Bin

机构信息

Gansu Provincial Key Laboratory of Wearable Computing, School of Information Science and Engineering, Lanzhou University, Lanzhou, China.

Tianshui Third People's Hospital, Tianshui, China.

出版信息

Front Neurosci. 2023 Mar 23;17:1141621. doi: 10.3389/fnins.2023.1141621. eCollection 2023.

DOI:10.3389/fnins.2023.1141621

PMID:37034153

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10076578/

Abstract

INTRODUCTION

As a biomarker of depression, speech signal has attracted the interest of many researchers due to its characteristics of easy collection and non-invasive. However, subjects' speech variation under different scenes and emotional stimuli, the insufficient amount of depression speech data for deep learning, and the variable length of speech frame-level features have an impact on the recognition performance.

METHODS

The above problems, this study proposes a multi-task ensemble learning method based on speaker embeddings for depression classification. First, we extract the Mel Frequency Cepstral Coefficients (MFCC), the Perceptual Linear Predictive Coefficients (PLP), and the Filter Bank (FBANK) from the out-domain dataset (CN-Celeb) and train the Resnet x-vector extractor, Time delay neural network (TDNN) x-vector extractor, and i-vector extractor. Then, we extract the corresponding speaker embeddings of fixed length from the depression speech database of the Gansu Provincial Key Laboratory of Wearable Computing. Support Vector Machine (SVM) and Random Forest (RF) are used to obtain the classification results of speaker embeddings in nine speech tasks. To make full use of the information of speech tasks with different scenes and emotions, we aggregate the classification results of nine tasks into new features and then obtain the final classification results by using Multilayer Perceptron (MLP). In order to take advantage of the complementary effects of different features, Resnet x-vectors based on different acoustic features are fused in the ensemble learning method.

RESULTS

Experimental results demonstrate that (1) MFCC-based Resnet x-vectors perform best among the nine speaker embeddings for depression detection; (2) interview speech is better than picture descriptions speech, and neutral stimulus is the best among the three emotional valences in the depression recognition task; (3) our multi-task ensemble learning method with MFCC-based Resnet x-vectors can effectively identify depressed patients; (4) in all cases, the combination of MFCC-based Resnet x-vectors and PLP-based Resnet x-vectors in our ensemble learning method achieves the best results, outperforming other literature studies using the depression speech database.

DISCUSSION

Our multi-task ensemble learning method with MFCC-based Resnet x-vectors can fuse the depression related information of different stimuli effectively, which provides a new approach for depression detection. The limitation of this method is that speaker embeddings extractors were pre-trained on the out-domain dataset. We will consider using the augmented in-domain dataset for pre-training to improve the depression recognition performance further.

摘要

引言

作为抑郁症的一种生物标志物，语音信号因其易于采集和非侵入性的特点，吸引了众多研究人员的关注。然而，受试者在不同场景和情绪刺激下的语音变化、深度学习所需的抑郁症语音数据量不足以及语音帧级特征的可变长度，都会对识别性能产生影响。

方法

针对上述问题，本研究提出了一种基于说话人嵌入的多任务集成学习方法用于抑郁症分类。首先，我们从域外数据集（CN-Celeb）中提取梅尔频率倒谱系数（MFCC）、感知线性预测系数（PLP）和滤波器组（FBANK），并训练残差网络x向量提取器、时延神经网络（TDNN）x向量提取器和i向量提取器。然后，我们从甘肃省可穿戴计算重点实验室的抑郁症语音数据库中提取固定长度的相应说话人嵌入。使用支持向量机（SVM）和随机森林（RF）来获得九个语音任务中说话人嵌入的分类结果。为了充分利用不同场景和情绪的语音任务信息，我们将九个任务的分类结果聚合为新特征，然后使用多层感知器（MLP）获得最终分类结果。为了利用不同特征的互补作用，在集成学习方法中融合了基于不同声学特征的残差网络x向量。

结果

实验结果表明：（1）基于MFCC的残差网络x向量在九个用于抑郁症检测的说话人嵌入中表现最佳；（2）访谈语音优于图片描述语音，在抑郁症识别任务中，中性刺激在三种情绪效价中表现最佳；（3）我们基于MFCC的残差网络x向量的多任务集成学习方法能够有效识别抑郁症患者；（4）在所有情况下，我们的集成学习方法中基于MFCC的残差网络x向量和基于PLP的残差网络x向量的组合取得了最佳效果，优于使用抑郁症语音数据库的其他文献研究。

讨论

我们基于MFCC的残差网络x向量的多任务集成学习方法能够有效融合不同刺激下与抑郁症相关的信息，为抑郁症检测提供了一种新方法。该方法的局限性在于说话人嵌入提取器是在域外数据集上进行预训练的。我们将考虑使用增强的域内数据集进行预训练，以进一步提高抑郁症识别性能。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9ff9/10076578/80e01b5956df/fnins-17-1141621-g001.jpg

相似文献

Ensemble learning with speaker embeddings in multiple speech task stimuli for depression detection.在用于抑郁症检测的多语音任务刺激中结合说话人嵌入的集成学习。

Front Neurosci. 2023 Mar 23;17:1141621. doi: 10.3389/fnins.2023.1141621. eCollection 2023.

Hybrid machine learning classification scheme for speaker identification.用于说话人识别的混合机器学习分类方案。

J Forensic Sci. 2022 May;67(3):1033-1048. doi: 10.1111/1556-4029.15006. Epub 2022 Feb 9.

X-Vectors: New Quantitative Biomarkers for Early Parkinson's Disease Detection From Speech.X向量：用于早期帕金森病语音检测的新型定量生物标志物。

Front Neuroinform. 2021 Feb 19;15:578369. doi: 10.3389/fninf.2021.578369. eCollection 2021.

Recurrence plot embeddings as short segment nonlinear features for multimodal speaker identification using air, bone and throat microphones.基于空气、骨和喉传声器的多模态说话人识别的递归图嵌入作为短段非线性特征。

Sci Rep. 2024 May 31;14(1):12513. doi: 10.1038/s41598-024-62406-3.

The Impact of Speaker Diarization on DNN-based Autism Severity Estimation.说话人分段对基于 DNN 的自闭症严重程度估计的影响。

Annu Int Conf IEEE Eng Med Biol Soc. 2022 Jul;2022:3414-3417. doi: 10.1109/EMBC48229.2022.9871523.

Manifestation of depression in speech overlaps with characteristics used to represent and recognize speaker identity.抑郁在言语中的表现与用于代表和识别说话者身份的特征重叠。

Sci Rep. 2023 Jul 10;13(1):11155. doi: 10.1038/s41598-023-35184-7.

High-Level CNN and Machine Learning Methods for Speaker Recognition.基于深度学习的说话人识别方法。

Sensors (Basel). 2023 Mar 25;23(7):3461. doi: 10.3390/s23073461.

Developing sequentially trained robust Punjabi speech recognition system under matched and mismatched conditions.在匹配和不匹配条件下开发顺序训练的鲁棒旁遮普语语音识别系统。

Complex Intell Systems. 2023;9(1):1-23. doi: 10.1007/s40747-022-00651-7. Epub 2022 Jun 2.

Combination of deep speaker embeddings for diarisation.用于语音分离的深度说话人嵌入组合

Neural Netw. 2021 Sep;141:372-384. doi: 10.1016/j.neunet.2021.04.020. Epub 2021 Apr 21.

A Two-Level Speaker Identification System via Fusion of Heterogeneous Classifiers and Complementary Feature Cooperation.基于异质分类器融合与互补特征协作的两级说话人识别系统。

Sensors (Basel). 2021 Jul 28;21(15):5097. doi: 10.3390/s21155097.

引用本文的文献

A Privacy-Preserving Unsupervised Speaker Disentanglement Method for Depression Detection from Speech.一种用于从语音中检测抑郁症的隐私保护无监督说话人解缠方法。

CEUR Workshop Proc. 2024 Feb;3649:57-63.

A Comprehensive Review on Synergy of Multi-Modal Data and AI Technologies in Medical Diagnosis.多模态数据与人工智能技术在医学诊断中的协同作用综合综述

Bioengineering (Basel). 2024 Feb 25;11(3):219. doi: 10.3390/bioengineering11030219.

Enhancing accuracy and privacy in speech-based depression detection through speaker disentanglement.通过说话人解缠提高基于语音的抑郁症检测的准确性和隐私性。

Comput Speech Lang. 2024 Jun;86. doi: 10.1016/j.csl.2023.101605. Epub 2023 Dec 26.

本文引用的文献

Depression detection based on linear and nonlinear speech features in I-vector/SVDA framework.基于 I-vector/SVDA 框架中的线性和非线性语音特征的抑郁检测。

Comput Biol Med. 2022 Oct;149:105926. doi: 10.1016/j.compbiomed.2022.105926. Epub 2022 Aug 6.

Speech depression recognition based on attentional residual network.基于注意力残差网络的语音抑郁识别。

Front Biosci (Landmark Ed). 2021 Dec 30;26(12):1746-1759. doi: 10.52586/5066.

Depression Speech Recognition With a Three-Dimensional Convolutional Network.基于三维卷积网络的抑郁症语音识别

Front Hum Neurosci. 2021 Sep 30;15:713823. doi: 10.3389/fnhum.2021.713823. eCollection 2021.

Gut Hormones as Potential Therapeutic Targets or Biomarkers of Response in Depression: The Case of Motilin.肠道激素作为抑郁症潜在的治疗靶点或反应生物标志物：胃动素的案例

Life (Basel). 2021 Aug 29;11(9):892. doi: 10.3390/life11090892.

Prevalence of depressive disorders and treatment in China: a cross-sectional epidemiological study.中国抑郁障碍的患病率及治疗状况：一项横断面流行病学研究。

Lancet Psychiatry. 2021 Nov;8(11):981-990. doi: 10.1016/S2215-0366(21)00251-0. Epub 2021 Sep 21.

Deep Neural Networks for Depression Recognition Based on 2D and 3D Facial Expressions Under Emotional Stimulus Tasks.基于情感刺激任务下二维和三维面部表情的深度神经网络用于抑郁症识别

Front Neurosci. 2021 Apr 23;15:609760. doi: 10.3389/fnins.2021.609760. eCollection 2021.

Using i-vectors from voice features to identify major depressive disorder.利用语音特征的 i-向量识别重度抑郁症。

J Affect Disord. 2021 Jun 1;288:161-166. doi: 10.1016/j.jad.2021.04.004. Epub 2021 Apr 20.

Automated depression analysis using convolutional neural networks from speech.基于语音的卷积神经网络进行自动抑郁分析。

J Biomed Inform. 2018 Jul;83:103-111. doi: 10.1016/j.jbi.2018.05.007. Epub 2018 May 29.

pyAudioAnalysis: An Open-Source Python Library for Audio Signal Analysis.pyAudioAnalysis：一个用于音频信号分析的开源Python库。

PLoS One. 2015 Dec 11;10(12):e0144610. doi: 10.1371/journal.pone.0144610. eCollection 2015.

Voice acoustic measures of depression severity and treatment response collected via interactive voice response (IVR) technology.通过交互式语音应答（IVR）技术收集的抑郁症严重程度和治疗反应的语音声学指标。

J Neurolinguistics. 2007 Jan;20(1):50-64. doi: 10.1016/j.jneuroling.2006.04.001.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

在用于抑郁症检测的多语音任务刺激中结合说话人嵌入的集成学习。

Ensemble learning with speaker embeddings in multiple speech task stimuli for depression detection.

作者信息

机构信息

出版信息

INTRODUCTION

METHODS

RESULTS

DISCUSSION

引言

方法

结果

讨论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献