Huang Jie, Zhao Yanli, Tian Zhanxiao, Qu Wei, Du Xia, Zhang Jie, Zhang Meng, Tan Yunlong, Wang Zhiren, Tan Shuping
Beijing HuiLongGuan Hospital, Peking University HuiLongGuan Clinical Medical School, Changping District, Beijing, 100096, China.
BMC Psychiatry. 2025 May 8;25(1):466. doi: 10.1186/s12888-025-06888-z.
Accurate detection of schizophrenia poses a grand challenge as a complex and heterogeneous mental disorder. Current diagnostic criteria rely primarily on clinical symptoms, which may not fully capture individual differences and the heterogeneity of the disorder. In this study, a discriminative model of schizophrenic speech based on deep learning is developed, which combines different emotional stimuli and features.
A total of 156 schizophrenia patients and 74 healthy controls participated in the study, reading three fixed texts with varying emotional stimuli. The log-Mel spectrogram and Mel-frequency cepstral coefficients (MFCCs) were extracted using the librosa-0.9.2 toolkit. Convolutional neural networks were applied to analyze the log-Mel spectrogram. The effects of different emotional stimuli and the fusion of demographic information and MFCCs on schizophrenia detection were examined.
The discriminant analysis results showed superior performance for neutral emotional stimuli compared to positive and negative stimuli. Integrating different emotional stimuli and fusing features with personal information improved sensitivity and specificity. The best discriminant model achieved an accuracy of 91.7%, sensitivity of 94.9%, specificity of 85.1%, and ROC-AUC of 0.963.
Speech analysis under neutral emotional stimulation demonstrated greater differences between schizophrenia patients and healthy controls, enhancing discriminative analysis of schizophrenia. Integrating different emotions, demographic information and MFCCs improved the accuracy of schizophrenia detection. This study provides a methodological foundation for constructing a personalized speech detection model for schizophrenia.
精神分裂症作为一种复杂的异质性精神障碍,准确检测面临巨大挑战。目前的诊断标准主要依赖临床症状,可能无法充分体现个体差异和该疾病的异质性。本研究基于深度学习开发了一种精神分裂症语音判别模型,该模型结合了不同的情感刺激和特征。
共有156名精神分裂症患者和74名健康对照参与研究,阅读三篇带有不同情感刺激的固定文本。使用librosa - 0.9.2工具包提取对数梅尔频谱图和梅尔频率倒谱系数(MFCC)。应用卷积神经网络分析对数梅尔频谱图。研究了不同情感刺激以及人口统计学信息与MFCC融合对精神分裂症检测的影响。
判别分析结果显示,与积极和消极刺激相比,中性情感刺激的表现更优。整合不同情感刺激并将特征与个人信息融合可提高敏感性和特异性。最佳判别模型的准确率为91.7%,敏感性为94.9%,特异性为85.1%,ROC - AUC为0.963。
中性情感刺激下的语音分析显示精神分裂症患者与健康对照之间存在更大差异,增强了对精神分裂症的判别分析。整合不同情感、人口统计学信息和MFCC提高了精神分裂症检测的准确性。本研究为构建精神分裂症个性化语音检测模型提供了方法学基础。