通过整合与说话者相关和与情感相关的特征，利用混合专家模型提高抑郁症识别能力。

Enhancing depression recognition through a mixed expert model by integrating speaker-related and emotion-related features.

作者信息

Guo Weitong, He Qian, Lin Ziyu, Bu Xiaolong, Wang Ziyang, Li Dong, Yang Hongwu

机构信息

School of Educational Technology, Northwest Normal University, Lanzhou, 730070, China.

Key Laboratory of Education Digitalization of Gansu Province, Lanzhou, 730070, China.

出版信息

Sci Rep. 2025 Feb 3;15(1):4064. doi: 10.1038/s41598-025-88313-9.

DOI:10.1038/s41598-025-88313-9

PMID:39900968

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11790824/

Abstract

The World Health Organization predicts that by 2030, depression will be the most common mental disorder, significantly affecting individuals, families, and society. Speech, as a sensitive indicator, reveals noticeable acoustic changes linked to physiological and cognitive variations, making it a crucial behavioral marker for detecting depression. However, existing studies often overlook the separation of speaker-related and emotion-related features in speech when recognizing depression. To tackle this challenge, we propose a Mixture-of-Experts (MoE) method that integrates speaker-related and emotion-related features for depression recognition. Our approach begins with a Time Delay Neural Network to pre-train a speaker-related feature extractor using a large-scale speaker recognition dataset while simultaneously pre-training a speaker's emotion-related feature extractor with a speech emotion dataset. We then apply transfer learning to extract both features from a depression dataset, followed by fusion. A multi-domain adaptation algorithm trains the MoE model for depression recognition. Experimental results demonstrate that our method achieves 74.3% accuracy on a self-built Chinese localized depression dataset and an MAE of 6.32 on the AVEC2014 dataset. Thus, it outperforms state-of-the-art deep learning methods that use speech features. Additionally, our approach shows strong performance across Chinese and English speech datasets, highlighting its effectiveness in addressing cultural variations.

摘要

世界卫生组织预测，到2030年，抑郁症将成为最常见的精神障碍，对个人、家庭和社会产生重大影响。言语作为一个敏感指标，揭示了与生理和认知变化相关的显著声学变化，使其成为检测抑郁症的关键行为标志物。然而，现有研究在识别抑郁症时，往往忽略了言语中与说话者相关和与情绪相关特征的分离。为应对这一挑战，我们提出了一种专家混合（MoE）方法，该方法整合了与说话者相关和与情绪相关的特征用于抑郁症识别。我们的方法首先使用时间延迟神经网络，利用大规模说话者识别数据集预训练一个与说话者相关的特征提取器，同时使用语音情感数据集预训练说话者的与情绪相关的特征提取器。然后，我们应用迁移学习从抑郁症数据集中提取这两种特征，随后进行融合。一种多域适应算法训练用于抑郁症识别的MoE模型。实验结果表明，我们的方法在自建的中文本地化抑郁症数据集上达到了74.3%的准确率，在AVEC2014数据集上的平均绝对误差为6.32。因此，它优于使用语音特征的现有深度学习方法。此外，我们的方法在中文和英文语音数据集上均表现出强大的性能，突出了其在应对文化差异方面的有效性。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

通过整合与说话者相关和与情感相关的特征，利用混合专家模型提高抑郁症识别能力。

Enhancing depression recognition through a mixed expert model by integrating speaker-related and emotion-related features.

作者信息

机构信息

出版信息

相似文献

本文引用的文献

通过整合与说话者相关和与情感相关的特征，利用混合专家模型提高抑郁症识别能力。

Enhancing depression recognition through a mixed expert model by integrating speaker-related and emotion-related features.

作者信息

机构信息

出版信息

相似文献

本文引用的文献