基于语音的预训练模型进行抑郁识别。

Depression recognition using voice-based pre-training model.

机构信息

School of Biomedical Engineering, South-Central Minzu University, No.182, Minzu Avenue, Hongshan District, Wuhan City, 430074, Hubei Province, China.

出版信息

Sci Rep. 2024 Jun 3;14(1):12734. doi: 10.1038/s41598-024-63556-0.

DOI:10.1038/s41598-024-63556-0

PMID:38830969

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11637030/

Abstract

The early screening of depression is highly beneficial for patients to obtain better diagnosis and treatment. While the effectiveness of utilizing voice data for depression detection has been demonstrated, the issue of insufficient dataset size remains unresolved. Therefore, we propose an artificial intelligence method to effectively identify depression. The wav2vec 2.0 voice-based pre-training model was used as a feature extractor to automatically extract high-quality voice features from raw audio. Additionally, a small fine-tuning network was used as a classification model to output depression classification results. Subsequently, the proposed model was fine-tuned on the DAIC-WOZ dataset and achieved excellent classification results. Notably, the model demonstrated outstanding performance in binary classification, attaining an accuracy of 0.9649 and an RMSE of 0.1875 on the test set. Similarly, impressive results were obtained in multi-classification, with an accuracy of 0.9481 and an RMSE of 0.3810. The wav2vec 2.0 model was first used for depression recognition and showed strong generalization ability. The method is simple, practical, and applicable, which can assist doctors in the early screening of depression.

摘要

早期筛查抑郁对患者获得更好的诊断和治疗非常有益。虽然利用语音数据进行抑郁检测的有效性已经得到证明，但数据集规模不足的问题仍未得到解决。因此，我们提出了一种人工智能方法来有效识别抑郁。该方法使用 wav2vec 2.0 语音预训练模型作为特征提取器，从原始音频中自动提取高质量的语音特征。此外，还使用一个小型的微调网络作为分类模型，输出抑郁分类结果。随后，我们在 DAIC-WOZ 数据集上对所提出的模型进行了微调，并取得了优异的分类结果。值得注意的是，该模型在二进制分类中表现出色，在测试集上的准确率为 0.9649，RMSE 为 0.1875。在多类分类中也取得了令人印象深刻的结果，准确率为 0.9481，RMSE 为 0.3810。该模型首次被用于抑郁识别，表现出较强的泛化能力。该方法简单、实用、适用，可以辅助医生进行抑郁的早期筛查。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/452e/11637030/aed1c642470b/41598_2024_63556_Fig1_HTML.jpg

相似文献

Depression recognition using voice-based pre-training model.基于语音的预训练模型进行抑郁识别。

Sci Rep. 2024 Jun 3;14(1):12734. doi: 10.1038/s41598-024-63556-0.

[A research on depression recognition based on voice pre-training model].基于语音预训练模型的抑郁症识别研究

Sheng Wu Yi Xue Gong Cheng Xue Za Zhi. 2024 Feb 25;41(1):9-16. doi: 10.7507/1001-5515.202304008.

End-to-end multimodal clinical depression recognition using deep neural networks: A comparative analysis.端到端使用深度神经网络进行多模态临床抑郁症识别：比较分析。

Comput Methods Programs Biomed. 2021 Nov;211:106433. doi: 10.1016/j.cmpb.2021.106433. Epub 2021 Sep 28.

Improving speech depression detection using transfer learning with wav2vec 2.0 in low-resource environments.在低资源环境中使用带有wav2vec 2.0的迁移学习改进语音抑郁检测

Sci Rep. 2024 Apr 25;14(1):9543. doi: 10.1038/s41598-024-60278-1.

A hybrid approach for binary and multi-class classification of voice disorders using a pre-trained model and ensemble classifiers.一种使用预训练模型和集成分类器对语音障碍进行二分类和多分类的混合方法。

BMC Med Inform Decis Mak. 2025 May 1;25(1):177. doi: 10.1186/s12911-025-02978-w.

A New Regression Model for Depression Severity Prediction Based on Correlation among Audio Features Using a Graph Convolutional Neural Network.一种基于图卷积神经网络利用音频特征间相关性进行抑郁严重程度预测的新型回归模型。

Diagnostics (Basel). 2023 Feb 14;13(4):727. doi: 10.3390/diagnostics13040727.

Manifestation of depression in speech overlaps with characteristics used to represent and recognize speaker identity.抑郁在言语中的表现与用于代表和识别说话者身份的特征重叠。

Sci Rep. 2023 Jul 10;13(1):11155. doi: 10.1038/s41598-023-35184-7.

WavBERT: Exploiting Semantic and Non-semantic Speech using Wav2vec and BERT for Dementia Detection.WavBERT：利用Wav2vec和BERT中的语义和非语义语音进行痴呆症检测。

Interspeech. 2021 Aug-Sep;2021:3790-3794. doi: 10.21437/interspeech.2021-332.

Optimizing depression detection in clinical doctor-patient interviews using a multi-instance learning framework.使用多实例学习框架优化临床医患访谈中的抑郁症检测。

Sci Rep. 2025 Feb 24;15(1):6637. doi: 10.1038/s41598-025-90117-w.

Voice Disorder Classification Using Wav2vec 2.0 Feature Extraction.使用Wav2vec 2.0特征提取的语音障碍分类

J Voice. 2024 Sep 25. doi: 10.1016/j.jvoice.2024.09.002.

引用本文的文献

Depression detection methods based on multimodal fusion of voice and text.基于语音与文本多模态融合的抑郁症检测方法

Sci Rep. 2025 Jul 1;15(1):21907. doi: 10.1038/s41598-025-03524-4.

Method Matters: Enhancing Voice-Based Depression Detection With a New Data Collection Framework.方法很重要：利用新的数据收集框架增强基于语音的抑郁症检测

Depress Anxiety. 2025 May 20;2025:4839334. doi: 10.1155/da/4839334. eCollection 2025.

Intimate partner violence and stress-related disorders: from epigenomics to resilience.亲密伴侣暴力与应激相关障碍：从表观基因组学到复原力

Front Glob Womens Health. 2025 May 12;6:1536169. doi: 10.3389/fgwh.2025.1536169. eCollection 2025.

本文引用的文献

Task-state skin potential abnormalities can distinguish major depressive disorder and bipolar depression from healthy controls.任务态皮肤电位异常可区分重性抑郁障碍和双相抑郁与健康对照。

Transl Psychiatry. 2024 Feb 23;14(1):110. doi: 10.1038/s41398-024-02828-9.

Local structure-aware graph contrastive representation learning.基于局部结构感知的图对比表示学习。

Neural Netw. 2024 Apr;172:106083. doi: 10.1016/j.neunet.2023.12.037. Epub 2023 Dec 27.

Breast cancer diagnosis using the fast learning network algorithm.使用快速学习网络算法进行乳腺癌诊断。

Front Oncol. 2023 Apr 27;13:1150840. doi: 10.3389/fonc.2023.1150840. eCollection 2023.

Attention guided learnable time-domain filterbanks for speech depression detection.注意力引导可学习时域滤波器组用于语音抑郁检测。

Neural Netw. 2023 Aug;165:135-149. doi: 10.1016/j.neunet.2023.05.041. Epub 2023 May 26.

Electroencephalography-Based Depression Detection Using Multiple Machine Learning Techniques.基于脑电图的抑郁症检测：使用多种机器学习技术

Diagnostics (Basel). 2023 May 17;13(10):1779. doi: 10.3390/diagnostics13101779.

High-Density Electroencephalography and Speech Signal Based Deep Framework for Clinical Depression Diagnosis.基于高密度脑电图和语音信号的临床抑郁症诊断深度框架

IEEE/ACM Trans Comput Biol Bioinform. 2023 Jul-Aug;20(4):2587-2597. doi: 10.1109/TCBB.2023.3257175. Epub 2023 Aug 9.

Non-intrusive RF sensing for early diagnosis of spinal curvature syndrome disorders.非侵入式射频感应技术在脊柱弯曲综合征早期诊断中的应用。

Comput Biol Med. 2023 Mar;155:106614. doi: 10.1016/j.compbiomed.2023.106614. Epub 2023 Feb 8.

Depression signal correlation identification from different EEG channels based on CNN feature extraction.基于 CNN 特征提取的不同 EEG 通道中抑郁信号的相关性识别。

Psychiatry Res Neuroimaging. 2023 Jan;328:111582. doi: 10.1016/j.pscychresns.2022.111582. Epub 2022 Dec 20.

TAMFN: Time-Aware Attention Multimodal Fusion Network for Depression Detection.TAMFN：用于抑郁症检测的时间感知注意力多模态融合网络

IEEE Trans Neural Syst Rehabil Eng. 2023;31:669-679. doi: 10.1109/TNSRE.2022.3224135. Epub 2023 Feb 2.

Particle Swarm Optimization-Based Extreme Learning Machine for COVID-19 Detection.基于粒子群优化的极限学习机用于新冠病毒检测

Cognit Comput. 2022 Oct 12:1-16. doi: 10.1007/s12559-022-10063-x.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

基于语音的预训练模型进行抑郁识别。

Depression recognition using voice-based pre-training model.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献