School of Biomedical Engineering, South-Central Minzu University, No.182, Minzu Avenue, Hongshan District, Wuhan City, 430074, Hubei Province, China.
Sci Rep. 2024 Jun 3;14(1):12734. doi: 10.1038/s41598-024-63556-0.
The early screening of depression is highly beneficial for patients to obtain better diagnosis and treatment. While the effectiveness of utilizing voice data for depression detection has been demonstrated, the issue of insufficient dataset size remains unresolved. Therefore, we propose an artificial intelligence method to effectively identify depression. The wav2vec 2.0 voice-based pre-training model was used as a feature extractor to automatically extract high-quality voice features from raw audio. Additionally, a small fine-tuning network was used as a classification model to output depression classification results. Subsequently, the proposed model was fine-tuned on the DAIC-WOZ dataset and achieved excellent classification results. Notably, the model demonstrated outstanding performance in binary classification, attaining an accuracy of 0.9649 and an RMSE of 0.1875 on the test set. Similarly, impressive results were obtained in multi-classification, with an accuracy of 0.9481 and an RMSE of 0.3810. The wav2vec 2.0 model was first used for depression recognition and showed strong generalization ability. The method is simple, practical, and applicable, which can assist doctors in the early screening of depression.
早期筛查抑郁对患者获得更好的诊断和治疗非常有益。虽然利用语音数据进行抑郁检测的有效性已经得到证明,但数据集规模不足的问题仍未得到解决。因此,我们提出了一种人工智能方法来有效识别抑郁。该方法使用 wav2vec 2.0 语音预训练模型作为特征提取器,从原始音频中自动提取高质量的语音特征。此外,还使用一个小型的微调网络作为分类模型,输出抑郁分类结果。随后,我们在 DAIC-WOZ 数据集上对所提出的模型进行了微调,并取得了优异的分类结果。值得注意的是,该模型在二进制分类中表现出色,在测试集上的准确率为 0.9649,RMSE 为 0.1875。在多类分类中也取得了令人印象深刻的结果,准确率为 0.9481,RMSE 为 0.3810。该模型首次被用于抑郁识别,表现出较强的泛化能力。该方法简单、实用、适用,可以辅助医生进行抑郁的早期筛查。