利用语音特征的 i-向量识别重度抑郁症。

Using i-vectors from voice features to identify major depressive disorder.

机构信息

CAS Key Laboratory of Behavioral Science, Institute of Psychology, Beijing 100101, China; Department of Psychology, University of Chinese Academy of Sciences, Beijing 100049, China.

School of Optometry, The Hong Kong Polytechnic University, Hung Hom, Kowloon, Hong Kong.

出版信息

J Affect Disord. 2021 Jun 1;288:161-166. doi: 10.1016/j.jad.2021.04.004. Epub 2021 Apr 20.

DOI:10.1016/j.jad.2021.04.004

PMID:33895418

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11681263/

Abstract

BACKGROUND

Machine-learning methods using acoustic features in the diagnosis of major depressive disorder (MDD) have insufficient evidence from large-scale samples and clinical trials. This study aimed to evaluate the effectiveness of the promising i-vector method on a large sample of women with recurrent MDD diagnosed clinically, examine its robustness, and provide an explicit acoustic explanation of the i-vectors.

METHODS

We collected utterances edited from clinical interview speech records of 785 depressed and 1,023 healthy individuals. Then, we extracted Mel-frequency cepstral coefficient (MFCC) features and MFCC i-vectors from their utterances. To examine the effectiveness of i-vectors, we compared the performance of binary logistic regression between MFCC i-vectors and MFCC features and tested its robustness on different utterance durations. We also determined the correlation between MFCC features and MFCC i-vectors to analyze the acoustic meaning of i-vectors.

RESULTS

The i-vectors improved 7% and 14% of area under the curve (AUC) for MFCC features using different utterances. When the duration is > 40 s, the classification results are stabilized. The i-vectors are consistently correlated to the maximum, minimum, and deviations of MFCC features (either positively or negatively).

LIMITATIONS

This study included only women.

CONCLUSIONS

The i-vectors can improve 14% of the AUC on a large-scale clinical sample. This system is robust to utterance duration > 40 s. This study provides a foundation for exploring the clinical application of voice features in the diagnosis of MDD.

摘要

背景

使用声学特征诊断重度抑郁症（MDD）的机器学习方法缺乏来自大样本和临床试验的充分证据。本研究旨在评估 i-vector 方法在经过临床诊断的复发性 MDD 女性大样本中的有效性，检验其稳健性，并提供 i-vector 的明确声学解释。

方法

我们收集了 785 名抑郁患者和 1023 名健康个体的临床访谈语音记录中编辑过的话语。然后，我们从他们的话语中提取梅尔频率倒谱系数（MFCC）特征和 MFCC i-vectors。为了检验 i-vectors 的有效性，我们比较了 MFCC i-vectors 和 MFCC 特征的二元逻辑回归性能，并测试了在不同话语时长下的稳健性。我们还确定了 MFCC 特征和 MFCC i-vectors 之间的相关性，以分析 i-vectors 的声学意义。

结果

对于不同时长的话语，i-vectors 分别提高了 MFCC 特征 AUC 的 7%和 14%。当时长>40s 时，分类结果趋于稳定。i-vectors与 MFCC 特征的最大值、最小值和偏差（无论是正相关还是负相关）始终相关。

局限性

本研究仅包括女性。

结论

i-vectors 可以提高大型临床样本中 14%的 AUC。该系统对时长>40s 的话语具有稳健性。本研究为探索语音特征在 MDD 诊断中的临床应用奠定了基础。

相似文献

Using i-vectors from voice features to identify major depressive disorder.利用语音特征的 i-向量识别重度抑郁症。

J Affect Disord. 2021 Jun 1;288:161-166. doi: 10.1016/j.jad.2021.04.004. Epub 2021 Apr 20.

Major depressive disorder discrimination using vocal acoustic features.使用声音声学特征对重度抑郁症进行歧视。

J Affect Disord. 2018 Jan 1;225:214-220. doi: 10.1016/j.jad.2017.08.038. Epub 2017 Aug 16.

Combining Polygenic Risk Score and Voice Features to Detect Major Depressive Disorders.结合多基因风险评分和语音特征来检测重度抑郁症

Front Genet. 2021 Dec 20;12:761141. doi: 10.3389/fgene.2021.761141. eCollection 2021.

Detection of Neurogenic Voice Disorders Using the Fisher Vector Representation of Cepstral Features.使用倒谱特征的费舍尔向量表示法检测神经性嗓音障碍

J Voice. 2025 May;39(3):757-763. doi: 10.1016/j.jvoice.2022.10.016. Epub 2022 Nov 21.

Analysis and prediction of acoustic speech features from mel-frequency cepstral coefficients in distributed speech recognition architectures.分布式语音识别架构中基于梅尔频率倒谱系数的声学语音特征分析与预测

J Acoust Soc Am. 2008 Dec;124(6):3989-4000. doi: 10.1121/1.2997436.

Ensemble learning with speaker embeddings in multiple speech task stimuli for depression detection.在用于抑郁症检测的多语音任务刺激中结合说话人嵌入的集成学习。

Front Neurosci. 2023 Mar 23;17:1141621. doi: 10.3389/fnins.2023.1141621. eCollection 2023.

COPDVD: Automated classification of chronic obstructive pulmonary disease on a new collected and evaluated voice dataset.COPDVD：在新收集和评估的语音数据集上对慢性阻塞性肺疾病进行自动化分类。

Artif Intell Med. 2024 Oct;156:102953. doi: 10.1016/j.artmed.2024.102953. Epub 2024 Aug 15.

[A comparative study of pathological voice based on traditional acoustic characteristics and nonlinear features].基于传统声学特征和非线性特征的病理性嗓音比较研究

Sheng Wu Yi Xue Gong Cheng Xue Za Zhi. 2014 Oct;31(5):1149-54.

Voice Disorder Classification Based on Multitaper Mel Frequency Cepstral Coefficients Features.基于多窗梅尔频率倒谱系数特征的嗓音障碍分类

Comput Math Methods Med. 2015;2015:956249. doi: 10.1155/2015/956249. Epub 2015 Nov 22.

The voice of depression: speech features as biomarkers for major depressive disorder.抑郁的声音：言语特征作为重性抑郁障碍的生物标志物。

BMC Psychiatry. 2024 Nov 12;24(1):794. doi: 10.1186/s12888-024-06253-6.

引用本文的文献

[Construction of recognition models for subthreshold depression based on multiple machine learning algorithms and vocal emotional characteristics].基于多种机器学习算法和嗓音情感特征的阈下抑郁症识别模型构建

Nan Fang Yi Ke Da Xue Xue Bao. 2025 Apr 20;45(4):711-717. doi: 10.12122/j.issn.1673-4254.2025.04.05.

Unraveling the associations between voice pitch and major depressive disorder: a multisite genetic study.揭示嗓音音高与重度抑郁症之间的关联：一项多中心基因研究。

Mol Psychiatry. 2025 Jun;30(6):2686-2695. doi: 10.1038/s41380-024-02877-y. Epub 2024 Dec 31.

The voice of depression: speech features as biomarkers for major depressive disorder.抑郁的声音：言语特征作为重性抑郁障碍的生物标志物。

BMC Psychiatry. 2024 Nov 12;24(1):794. doi: 10.1186/s12888-024-06253-6.

Unraveling the Associations Between Voice Pitch and Major Depressive Disorder: A Multisite Genetic Study.揭示嗓音音高与重度抑郁症之间的关联：一项多中心基因研究。

medRxiv. 2024 Oct 13:2024.10.12.24315366. doi: 10.1101/2024.10.12.24315366.

Genetic association analysis of human median voice pitch identifies a common locus for tonal and non-tonal languages.人类中声音频率基因关联分析识别出音调和非音调语言的共同位置。

Commun Biol. 2024 May 7;7(1):540. doi: 10.1038/s42003-024-06198-2.

Predictive modeling of neuroticism in depressed and non-depressed cohorts using voice features.利用语音特征对抑郁和非抑郁人群的神经质进行预测建模。

J Affect Disord. 2024 May 1;352:395-402. doi: 10.1016/j.jad.2024.02.021. Epub 2024 Feb 9.

Enhancing accuracy and privacy in speech-based depression detection through speaker disentanglement.通过说话人解缠提高基于语音的抑郁症检测的准确性和隐私性。

Comput Speech Lang. 2024 Jun;86. doi: 10.1016/j.csl.2023.101605. Epub 2023 Dec 26.

Effectiveness of a Biofeedback Intervention Targeting Mental and Physical Health Among College Students Through Speech and Physiology as Biomarkers Using Machine Learning: A Randomized Controlled Trial.基于机器学习的言语和生理生物标志物的大学生身心反馈干预的效果：一项随机对照试验。

Appl Psychophysiol Biofeedback. 2024 Mar;49(1):71-83. doi: 10.1007/s10484-023-09612-3. Epub 2024 Jan 2.

Non-uniform Speaker Disentanglement For Depression Detection From Raw Speech Signals.基于原始语音信号的抑郁症检测的非均匀说话人解缠

Interspeech. 2023 Aug;2023:2343-2347. doi: 10.21437/interspeech.2023-2101.

Exploring the ability of vocal biomarkers in distinguishing depression from bipolar disorder, schizophrenia, and healthy controls.探索嗓音生物标志物在区分抑郁症与双相情感障碍、精神分裂症及健康对照方面的能力。

Front Psychiatry. 2023 Jul 20;14:1079448. doi: 10.3389/fpsyt.2023.1079448. eCollection 2023.

本文引用的文献

Automated assessment of psychiatric disorders using speech: A systematic review.使用语音对精神疾病进行自动评估：一项系统综述。

Laryngoscope Investig Otolaryngol. 2020 Jan 31;5(1):96-116. doi: 10.1002/lio2.354. eCollection 2020 Feb.

Acoustic differences between healthy and depressed people: a cross-situation study.健康人与抑郁症患者的声学差异：跨情境研究。

BMC Psychiatry. 2019 Oct 15;19(1):300. doi: 10.1186/s12888-019-2300-7.

Re-examining the robustness of voice features in predicting depression: Compared with baseline of confounders.重新检验语音特征预测抑郁症的稳健性：与混杂因素基线相比。

PLoS One. 2019 Jun 20;14(6):e0218172. doi: 10.1371/journal.pone.0218172. eCollection 2019.

Molecular Genetic Analysis Subdivided by Adversity Exposure Suggests Etiologic Heterogeneity in Major Depression.分子遗传学分析按逆境暴露细分提示重度抑郁症的病因异质性。

Am J Psychiatry. 2018 Jun 1;175(6):545-554. doi: 10.1176/appi.ajp.2017.17060621. Epub 2018 Mar 2.

Depression and subsequent risk of Parkinson disease: A nationwide cohort study.抑郁症与帕金森病的后续风险：一项全国性队列研究。

Neurology. 2015 Jun 16;84(24):2422-9. doi: 10.1212/WNL.0000000000001684. Epub 2015 May 20.

"Noncognitive" symptoms of early Alzheimer disease: a longitudinal analysis.早期阿尔茨海默病的“非认知”症状：一项纵向分析

Neurology. 2015 Feb 10;84(6):617-22. doi: 10.1212/WNL.0000000000001238.

Evidence for multiple genetic factors underlying DSM-IV criteria for major depression.支持 DSM-IV 重性抑郁障碍标准的多种遗传因素。

JAMA Psychiatry. 2013 Jun;70(6):599-607. doi: 10.1001/jamapsychiatry.2013.751.

DSM-5 field trials in the United States and Canada, Part II: test-retest reliability of selected categorical diagnoses.《精神障碍诊断与统计手册》第五版（DSM-5）在美国和加拿大的现场测试，第二部分：部分类别诊断的重测信度。

Am J Psychiatry. 2013 Jan;170(1):59-70. doi: 10.1176/appi.ajp.2012.12070999.

Genes, Environment, and Psychopathology: Understanding the Causes of Psychiatric and Substance Use Disorders.基因、环境与精神病理学：理解精神疾病和物质使用障碍的病因

Am J Psychiatry. 2007 Nov 1;164(11):1763-1764. doi: 10.1176/appi.ajp.2007.07081262.

Understanding and using sensitivity, specificity and predictive values.理解并运用敏感度、特异度和预测值。

Indian J Ophthalmol. 2008 Jan-Feb;56(1):45-50. doi: 10.4103/0301-4738.37595.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。