Suppr超能文献

审视你的言语:使用卷积神经网络识别精神病的诊断状态和阴性症状。

Taking a look at your speech: identifying diagnostic status and negative symptoms of psychosis using convolutional neural networks.

作者信息

Melshin Gleb, DiMaggio Anthony, Zeramdini Nadia, MacKinley Michael, Palaniyappan Lena, Voppel Alban

机构信息

Douglas Research Centre, McGill University, Montreal, QC Canada.

Faculty of Medicine, McGill University, Montréal, QC Canada.

出版信息

NPP Digit Psychiatry Neurosci. 2025;3(1):19. doi: 10.1038/s44277-025-00040-1. Epub 2025 Jul 8.

Abstract

Speech-based indices are promising objective biomarkers for identifying schizophrenia and monitoring symptom burden. Static acoustic features show potential but often overlook time-varying acoustic cues that clinicians naturally evaluate-such as negative symptoms-during clinical interviews. A similar dynamic, unfiltered approach can be applied using speech spectrograms, preserving acoustic-temporal nuances. Here, we investigate if this method has the potential to assist in the determination of diagnostic and symptom severity status. Speech recordings from 319 participants (227 with schizophrenia spectrum disorders, 92 healthy controls) were segmented into 10 s fragments of uninterrupted audio ( = 110,246) and transformed into log-Mel spectrograms to preserve both acoustic and temporal features. Participants were partitioned into training (70%), validation (15%), and test (15%) datasets without overlap. Modified ResNet-18 convolutional neural networks (CNNs) performed three classification tasks; (1) schizophrenia-spectrum vs healthy controls, within 179 clinically-rated patients, (2) individuals with more severe vs less severe negative symptom burden, and (3) clinically obvious vs subtle blunted affect. Grad-CAM was used to visualize salient regions of the spectrograms that contributed to classification. CNNs distinguished schizophrenia-spectrum participants from healthy controls with 87.8% accuracy (AUC = 0.86). The classifier trained on negative symptom burden performed with somewhat less accuracy (80.5%; AUC = 0.73) but the model detecting blunted affect above a predefined clinical threshold achieved 87.8% accuracy (AUC = 0.79). Importantly, acoustic information contributing to diagnostic classification was distinct from those identifying blunted affect. Grad-CAM visualization indicated that the CNN targeted regions consistent with human speech signals at the utterance level, highlighting clinically relevant vocal patterns. Our results suggest that spectrogram-based CNN analyses of short conversational segments can robustly detect both schizophrenia-spectrum disorders and ascertain burden of negative symptoms. This interpretable framework underscores how time-frequency feature maps of natural speech may facilitate more nuanced tracking and detection of negative symptoms in schizophrenia.

摘要

基于语音的指标有望成为识别精神分裂症和监测症状负担的客观生物标志物。静态声学特征显示出潜力,但往往忽略了临床医生在临床访谈中自然评估的随时间变化的声学线索,如阴性症状。可以使用语音频谱图应用类似的动态、未过滤方法,保留声学-时间细微差别。在这里,我们研究这种方法是否有可能帮助确定诊断和症状严重程度状态。来自319名参与者(227名患有精神分裂症谱系障碍,92名健康对照)的语音记录被分割成10秒的不间断音频片段(=110,246),并转换为对数梅尔频谱图以保留声学和时间特征。参与者被划分为训练(70%)、验证(15%)和测试(15%)数据集,且无重叠。改进的ResNet-18卷积神经网络(CNN)执行了三项分类任务;(1)在179名临床评级患者中,区分精神分裂症谱系与健康对照,(2)比较阴性症状负担较重与较轻的个体,以及(3)区分临床上明显与微妙的情感迟钝。使用Grad-CAM来可视化有助于分类的频谱图的显著区域。CNN区分精神分裂症谱系参与者与健康对照的准确率为87.8%(AUC=0.86)。基于阴性症状负担训练的分类器准确率略低(80.5%;AUC=0.73),但检测高于预定义临床阈值的情感迟钝的模型准确率达到87.8%(AUC=0.79)。重要的是,有助于诊断分类的声学信息与识别情感迟钝的信息不同。Grad-CAM可视化表明,CNN在话语层面针对与人类语音信号一致的区域,突出了临床相关的语音模式。我们的结果表明,基于频谱图的CNN对短对话片段的分析可以可靠地检测精神分裂症谱系障碍并确定阴性症状的负担。这个可解释的框架强调了自然语音的时频特征图如何促进对精神分裂症阴性症状更细致的跟踪和检测。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c0d2/12237691/9998ecffa5c6/44277_2025_40_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验