Suppr超能文献

通过听觉语音识别精神分裂症:基于深度学习的情感与特征融合语音判别分析

Hearing vocals to recognize schizophrenia: speech discriminant analysis with fusion of emotions and features based on deep learning.

作者信息

Huang Jie, Zhao Yanli, Tian Zhanxiao, Qu Wei, Du Xia, Zhang Jie, Zhang Meng, Tan Yunlong, Wang Zhiren, Tan Shuping

机构信息

Beijing HuiLongGuan Hospital, Peking University HuiLongGuan Clinical Medical School, Changping District, Beijing, 100096, China.

出版信息

BMC Psychiatry. 2025 May 8;25(1):466. doi: 10.1186/s12888-025-06888-z.

Abstract

BACKGROUND AND OBJECTIVE

Accurate detection of schizophrenia poses a grand challenge as a complex and heterogeneous mental disorder. Current diagnostic criteria rely primarily on clinical symptoms, which may not fully capture individual differences and the heterogeneity of the disorder. In this study, a discriminative model of schizophrenic speech based on deep learning is developed, which combines different emotional stimuli and features.

METHODS

A total of 156 schizophrenia patients and 74 healthy controls participated in the study, reading three fixed texts with varying emotional stimuli. The log-Mel spectrogram and Mel-frequency cepstral coefficients (MFCCs) were extracted using the librosa-0.9.2 toolkit. Convolutional neural networks were applied to analyze the log-Mel spectrogram. The effects of different emotional stimuli and the fusion of demographic information and MFCCs on schizophrenia detection were examined.

RESULTS

The discriminant analysis results showed superior performance for neutral emotional stimuli compared to positive and negative stimuli. Integrating different emotional stimuli and fusing features with personal information improved sensitivity and specificity. The best discriminant model achieved an accuracy of 91.7%, sensitivity of 94.9%, specificity of 85.1%, and ROC-AUC of 0.963.

CONCLUSIONS

Speech analysis under neutral emotional stimulation demonstrated greater differences between schizophrenia patients and healthy controls, enhancing discriminative analysis of schizophrenia. Integrating different emotions, demographic information and MFCCs improved the accuracy of schizophrenia detection. This study provides a methodological foundation for constructing a personalized speech detection model for schizophrenia.

摘要

背景与目的

精神分裂症作为一种复杂的异质性精神障碍,准确检测面临巨大挑战。目前的诊断标准主要依赖临床症状,可能无法充分体现个体差异和该疾病的异质性。本研究基于深度学习开发了一种精神分裂症语音判别模型,该模型结合了不同的情感刺激和特征。

方法

共有156名精神分裂症患者和74名健康对照参与研究,阅读三篇带有不同情感刺激的固定文本。使用librosa - 0.9.2工具包提取对数梅尔频谱图和梅尔频率倒谱系数(MFCC)。应用卷积神经网络分析对数梅尔频谱图。研究了不同情感刺激以及人口统计学信息与MFCC融合对精神分裂症检测的影响。

结果

判别分析结果显示,与积极和消极刺激相比,中性情感刺激的表现更优。整合不同情感刺激并将特征与个人信息融合可提高敏感性和特异性。最佳判别模型的准确率为91.7%,敏感性为94.9%,特异性为85.1%,ROC - AUC为0.963。

结论

中性情感刺激下的语音分析显示精神分裂症患者与健康对照之间存在更大差异,增强了对精神分裂症的判别分析。整合不同情感、人口统计学信息和MFCC提高了精神分裂症检测的准确性。本研究为构建精神分裂症个性化语音检测模型提供了方法学基础。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8e55/12060412/13aed2081720/12888_2025_6888_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验