• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

通过整合与说话者相关和与情感相关的特征,利用混合专家模型提高抑郁症识别能力。

Enhancing depression recognition through a mixed expert model by integrating speaker-related and emotion-related features.

作者信息

Guo Weitong, He Qian, Lin Ziyu, Bu Xiaolong, Wang Ziyang, Li Dong, Yang Hongwu

机构信息

School of Educational Technology, Northwest Normal University, Lanzhou, 730070, China.

Key Laboratory of Education Digitalization of Gansu Province, Lanzhou, 730070, China.

出版信息

Sci Rep. 2025 Feb 3;15(1):4064. doi: 10.1038/s41598-025-88313-9.

DOI:10.1038/s41598-025-88313-9
PMID:39900968
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11790824/
Abstract

The World Health Organization predicts that by 2030, depression will be the most common mental disorder, significantly affecting individuals, families, and society. Speech, as a sensitive indicator, reveals noticeable acoustic changes linked to physiological and cognitive variations, making it a crucial behavioral marker for detecting depression. However, existing studies often overlook the separation of speaker-related and emotion-related features in speech when recognizing depression. To tackle this challenge, we propose a Mixture-of-Experts (MoE) method that integrates speaker-related and emotion-related features for depression recognition. Our approach begins with a Time Delay Neural Network to pre-train a speaker-related feature extractor using a large-scale speaker recognition dataset while simultaneously pre-training a speaker's emotion-related feature extractor with a speech emotion dataset. We then apply transfer learning to extract both features from a depression dataset, followed by fusion. A multi-domain adaptation algorithm trains the MoE model for depression recognition. Experimental results demonstrate that our method achieves 74.3% accuracy on a self-built Chinese localized depression dataset and an MAE of 6.32 on the AVEC2014 dataset. Thus, it outperforms state-of-the-art deep learning methods that use speech features. Additionally, our approach shows strong performance across Chinese and English speech datasets, highlighting its effectiveness in addressing cultural variations.

摘要

世界卫生组织预测,到2030年,抑郁症将成为最常见的精神障碍,对个人、家庭和社会产生重大影响。言语作为一个敏感指标,揭示了与生理和认知变化相关的显著声学变化,使其成为检测抑郁症的关键行为标志物。然而,现有研究在识别抑郁症时,往往忽略了言语中与说话者相关和与情绪相关特征的分离。为应对这一挑战,我们提出了一种专家混合(MoE)方法,该方法整合了与说话者相关和与情绪相关的特征用于抑郁症识别。我们的方法首先使用时间延迟神经网络,利用大规模说话者识别数据集预训练一个与说话者相关的特征提取器,同时使用语音情感数据集预训练说话者的与情绪相关的特征提取器。然后,我们应用迁移学习从抑郁症数据集中提取这两种特征,随后进行融合。一种多域适应算法训练用于抑郁症识别的MoE模型。实验结果表明,我们的方法在自建的中文本地化抑郁症数据集上达到了74.3%的准确率,在AVEC2014数据集上的平均绝对误差为6.32。因此,它优于使用语音特征的现有深度学习方法。此外,我们的方法在中文和英文语音数据集上均表现出强大的性能,突出了其在应对文化差异方面的有效性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/aa75/11790824/994b21c2277a/41598_2025_88313_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/aa75/11790824/3ab1a6714267/41598_2025_88313_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/aa75/11790824/c130094f5f34/41598_2025_88313_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/aa75/11790824/e842c7838a41/41598_2025_88313_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/aa75/11790824/d372c4363e7a/41598_2025_88313_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/aa75/11790824/e7b2a8b3a46a/41598_2025_88313_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/aa75/11790824/c783d9a1ac5b/41598_2025_88313_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/aa75/11790824/07c7cd8e9792/41598_2025_88313_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/aa75/11790824/994b21c2277a/41598_2025_88313_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/aa75/11790824/3ab1a6714267/41598_2025_88313_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/aa75/11790824/c130094f5f34/41598_2025_88313_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/aa75/11790824/e842c7838a41/41598_2025_88313_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/aa75/11790824/d372c4363e7a/41598_2025_88313_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/aa75/11790824/e7b2a8b3a46a/41598_2025_88313_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/aa75/11790824/c783d9a1ac5b/41598_2025_88313_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/aa75/11790824/07c7cd8e9792/41598_2025_88313_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/aa75/11790824/994b21c2277a/41598_2025_88313_Fig8_HTML.jpg

相似文献

1
Enhancing depression recognition through a mixed expert model by integrating speaker-related and emotion-related features.通过整合与说话者相关和与情感相关的特征,利用混合专家模型提高抑郁症识别能力。
Sci Rep. 2025 Feb 3;15(1):4064. doi: 10.1038/s41598-025-88313-9.
2
Ensemble learning with speaker embeddings in multiple speech task stimuli for depression detection.在用于抑郁症检测的多语音任务刺激中结合说话人嵌入的集成学习。
Front Neurosci. 2023 Mar 23;17:1141621. doi: 10.3389/fnins.2023.1141621. eCollection 2023.
3
Impact of Feature Selection Algorithm on Speech Emotion Recognition Using Deep Convolutional Neural Network.基于深度卷积神经网络的特征选择算法对语音情感识别的影响。
Sensors (Basel). 2020 Oct 23;20(21):6008. doi: 10.3390/s20216008.
4
A fine-grained human facial key feature extraction and fusion method for emotion recognition.一种用于情感识别的细粒度人类面部关键特征提取与融合方法。
Sci Rep. 2025 Feb 20;15(1):6153. doi: 10.1038/s41598-025-90440-2.
5
Speech Emotion Recognition Using Convolution Neural Networks and Multi-Head Convolutional Transformer.基于卷积神经网络和多头卷积变换的语音情感识别。
Sensors (Basel). 2023 Jul 7;23(13):6212. doi: 10.3390/s23136212.
6
Two-Way Feature Extraction for Speech Emotion Recognition Using Deep Learning.基于深度学习的语音情感识别的双向特征提取。
Sensors (Basel). 2022 Mar 19;22(6):2378. doi: 10.3390/s22062378.
7
Fusion-ConvBERT: Parallel Convolution and BERT Fusion for Speech Emotion Recognition.融合卷积-BERT:语音情感识别的并行卷积和 BERT 融合。
Sensors (Basel). 2020 Nov 23;20(22):6688. doi: 10.3390/s20226688.
8
A Combined CNN Architecture for Speech Emotion Recognition.一种用于语音情感识别的 CNN 架构组合。
Sensors (Basel). 2024 Sep 6;24(17):5797. doi: 10.3390/s24175797.
9
An enhanced speech emotion recognition using vision transformer.基于视觉转换器的增强型语音情感识别。
Sci Rep. 2024 Jun 7;14(1):13126. doi: 10.1038/s41598-024-63776-4.
10
Emotion recognition for human-computer interaction using high-level descriptors.基于高层描述符的人机交互中的情感识别。
Sci Rep. 2024 May 27;14(1):12122. doi: 10.1038/s41598-024-59294-y.

本文引用的文献

1
Improving Cross-Corpus Speech Emotion Recognition with Adversarial Discriminative Domain Generalization (ADDoG).利用对抗性判别域泛化(ADDoG)改进跨语料库语音情感识别
IEEE Trans Affect Comput. 2021 Oct-Dec;12(4):1055-1068. doi: 10.1109/taffc.2019.2916092. Epub 2019 May 14.
2
Deep Neural Networks for Depression Recognition Based on 2D and 3D Facial Expressions Under Emotional Stimulus Tasks.基于情感刺激任务下二维和三维面部表情的深度神经网络用于抑郁症识别
Front Neurosci. 2021 Apr 23;15:609760. doi: 10.3389/fnins.2021.609760. eCollection 2021.
3
INFERRING CLINICAL DEPRESSION FROM SPEECH AND SPOKEN UTTERANCES.
从语音和话语中推断临床抑郁症
IEEE Int Workshop Mach Learn Signal Process. 2014 Sep;2014. doi: 10.1109/mlsp.2014.6958856. Epub 2014 Nov 20.
4
Automatic Detection of Depression in Speech Using Ensemble Convolutional Neural Networks.使用集成卷积神经网络自动检测语音中的抑郁症
Entropy (Basel). 2020 Jun 20;22(6):688. doi: 10.3390/e22060688.
5
Automated assessment of psychiatric disorders using speech: A systematic review.使用语音对精神疾病进行自动评估:一项系统综述。
Laryngoscope Investig Otolaryngol. 2020 Jan 31;5(1):96-116. doi: 10.1002/lio2.354. eCollection 2020 Feb.
6
Adaptive Mixtures of Local Experts.局部专家的自适应混合模型
Neural Comput. 1991 Spring;3(1):79-87. doi: 10.1162/neco.1991.3.1.79.
7
Automated depression analysis using convolutional neural networks from speech.基于语音的卷积神经网络进行自动抑郁分析。
J Biomed Inform. 2018 Jul;83:103-111. doi: 10.1016/j.jbi.2018.05.007. Epub 2018 May 29.
8
Are Emotions Natural Kinds?情绪是自然种类吗?
Perspect Psychol Sci. 2006 Mar;1(1):28-58. doi: 10.1111/j.1745-6916.2006.00003.x.
9
Projections of global mortality and burden of disease from 2002 to 2030.2002年至2030年全球死亡率及疾病负担预测。
PLoS Med. 2006 Nov;3(11):e442. doi: 10.1371/journal.pmed.0030442.