Olah Julianna, Wong Win Lee Edwin, Chaudhry Atta-Ul Raheem Rana, Mena Omar, Tang Sunny X
Psyrin Ltd. London, UK.
Psychiatry Research, Feinstein Institutes for Medical Research.
medRxiv. 2024 Sep 4:2024.09.03.24313020. doi: 10.1101/2024.09.03.24313020.
Psychosis poses substantial social and healthcare burdens. The analysis of speech is a promising approach for the diagnosis and monitoring of psychosis, capturing symptoms like thought disorder and flattened affect. Recent advancements in Natural Language Processing (NLP) methodologies enable the automated extraction of informative speech features, which has been leveraged for early psychosis detection and assessment of symptomology. However, critical gaps persist, including the absence of standardized sample collection protocols, small sample sizes, and a lack of multi-illness classification, limiting clinical applicability. Our study aimed to (1) identify an optimal assessment approach for the online and remote collection of speech, in the context of assessing the psychosis spectrum and evaluate whether a fully automated, speech-based machine learning (ML) pipeline can discriminate among different conditions on the schizophrenia-bipolar spectrum (SSD-BD-SPE), help-seeking comparison subjects (MDD), and healthy controls (HC) at varying layers of analysis and diagnostic complexity.
We adopted online data collection methods to collect 20 minutes of speech and demographic information from individuals. Participants were categorized as "healthy" help-seekers (HC), having a schizophrenia-spectrum disorder (SSD), bipolar disorder (BD), major depressive disorder (MDD), or being on the psychosis spectrum with sub-clinical psychotic experiences (SPE). SPE status was determined based on self-reported clinical diagnosis and responses to the PHQ-8 and PQ-16 screening questionnaires, while other diagnoses were determined based on self-report from participants. Linguistic and paralinguistic features were extracted and ensemble learning algorithms (e.g., XGBoost) were used to train models. A 70%-30% train-test split and 30-fold cross-validation was used to validate the model performance.
The final analysis sample included 1140 individuals and 22,650 minutes of speech. Using 5-minutes of speech, our model could discriminate between HC and those with a serious mental illness (SSD or BD) with 86% accuracy (AUC = 0.91, Recall = 0.7, Precision = 0.98). Furthermore, our model could discern among HC, SPE, BD and SSD groups with 86% accuracy (F1 macro = 0.855, Recall Macro = 0.86, Precision Macro = 0.86). Finally, in a 5-class discrimination task including individuals with MDD, our model had 76% accuracy (F1 macro = 0.757, Recall Macro = 0.758, Precision Macro = 0.766).
Our ML pipeline demonstrated disorder-specific learning, achieving excellent or good accuracy across several classification tasks. We demonstrated that the screening of mental disorders is possible via a fully automated, remote speech assessment pipeline. We tested our model on relatively high number conditions (5 classes) in the literature and in a stratified sample of psychosis spectrum, including HC, SPE, SSD and BD (4 classes). We tested our model on a large sample (N = 1150) and demonstrated best-in-class accuracy with remotely collected speech data in the psychosis spectrum, however, further clinical validation is needed to test the reliability of model performance.
精神病带来了巨大的社会和医疗负担。言语分析是一种很有前景的精神病诊断和监测方法,能够捕捉思维紊乱和平淡情感等症状。自然语言处理(NLP)方法的最新进展使得能够自动提取信息性言语特征,这已被用于早期精神病检测和症状学评估。然而,关键差距仍然存在,包括缺乏标准化的样本收集方案、样本量小以及缺乏多病种分类,限制了临床适用性。我们的研究旨在:(1)在评估精神病谱系的背景下,确定一种用于在线和远程收集言语的最佳评估方法,并评估基于言语的全自动机器学习(ML)流程能否在不同分析层和诊断复杂性水平上区分精神分裂症 - 双相谱系(SSD - BD - SPE)中的不同病症、寻求帮助的对照受试者(MDD)和健康对照(HC)。
我们采用在线数据收集方法,从个体收集20分钟的言语和人口统计学信息。参与者被分类为“健康”寻求帮助者(HC)、患有精神分裂症谱系障碍(SSD)、双相情感障碍(BD)、重度抑郁症(MDD)或处于伴有亚临床精神病体验(SPE)的精神病谱系中。SPE状态根据自我报告的临床诊断以及对PHQ - 8和PQ - 16筛查问卷的回答来确定,而其他诊断则根据参与者的自我报告来确定。提取语言和副语言特征,并使用集成学习算法(如XGBoost)训练模型。采用70% - 30%的训练 - 测试分割和30折交叉验证来验证模型性能。
最终分析样本包括1140名个体和22650分钟的言语。使用5分钟的言语记录,我们的模型能够以86%的准确率区分HC和患有严重精神疾病(SSD或BD)的个体(AUC = 0.91,召回率 = 0.7,精确率 = 0.98)。此外,我们的模型能够以86%的准确率区分HC、SPE、BD和SSD组(F1宏 = 0.855,召回率宏 = 0.86,精确率宏 = 0.86)。最后,在包括MDD个体在内的5类判别任务中,我们的模型准确率为76%(F1宏 = 0.757,召回率宏 = 0.758,精确率宏 = 0.766)。
我们的ML流程展示了针对特定病症的学习能力,在多个分类任务中实现了优异或良好的准确率。我们证明了通过全自动远程言语评估流程进行精神障碍筛查是可行的。我们在文献中相对较多的病症情况(5类)以及包括HC、SPE、SSD和BD(4类)的精神病谱系分层样本上测试了我们的模型。我们在大样本(N = 1150)上测试了我们的模型,并在精神病谱系中通过远程收集的言语数据展示了一流的准确率,然而,需要进一步的临床验证来测试模型性能的可靠性。