Department of Psychosis Studies, Institute of Psychiatry, Psychology and Neuroscience, King's College London, United Kingdom.
Huaxi MR Research Center (HMRRC), Department of Radiology, West China Hospital of Sichuan University, Chengdu, China.
Schizophr Bull. 2020 Jan 4;46(1):17-26. doi: 10.1093/schbul/sby189.
Despite the high level of interest in the use of machine learning (ML) and neuroimaging to detect psychosis at the individual level, the reliability of the findings is unclear due to potential methodological issues that may have inflated the existing literature. This study aimed to elucidate the extent to which the application of ML to neuroanatomical data allows detection of first episode psychosis (FEP), while putting in place methodological precautions to avoid overoptimistic results. We tested both traditional ML and an emerging approach known as deep learning (DL) using 3 feature sets of interest: (1) surface-based regional volumes and cortical thickness, (2) voxel-based gray matter volume (GMV) and (3) voxel-based cortical thickness (VBCT). To assess the reliability of the findings, we repeated all analyses in 5 independent datasets, totaling 956 participants (514 FEP and 444 within-site matched controls). The performance was assessed via nested cross-validation (CV) and cross-site CV. Accuracies ranged from 50% to 70% for surfaced-based features; from 50% to 63% for GMV; and from 51% to 68% for VBCT. The best accuracies (70%) were achieved when DL was applied to surface-based features; however, these models generalized poorly to other sites. Findings from this study suggest that, when methodological precautions are adopted to avoid overoptimistic results, detection of individuals in the early stages of psychosis is more challenging than originally thought. In light of this, we argue that the current evidence for the diagnostic value of ML and structural neuroimaging should be reconsidered toward a more cautious interpretation.
尽管人们对使用机器学习 (ML) 和神经影像学来检测个体水平的精神病非常感兴趣,但由于可能存在夸大现有文献的潜在方法问题,因此发现的可靠性尚不清楚。本研究旨在阐明将 ML 应用于神经解剖数据在多大程度上可以检测首发精神病 (FEP),同时采取方法学预防措施以避免过于乐观的结果。我们使用 3 种感兴趣的特征集:(1)基于表面的区域体积和皮质厚度,(2)体素基于的灰质体积 (GMV) 和(3)体素基于的皮质厚度 (VBCT),测试了传统 ML 和一种新兴方法,即深度学习 (DL)。为了评估研究结果的可靠性,我们在 5 个独立数据集(总计 956 名参与者,514 名 FEP 和 444 名同一站点匹配的对照组)中重复了所有分析。通过嵌套交叉验证 (CV) 和跨站点 CV 来评估性能。基于表面的特征的准确率从 50%到 70%不等;GMV 的准确率从 50%到 63%不等;VBCT 的准确率从 51%到 68%不等。当将 DL 应用于基于表面的特征时,可实现最佳准确率(70%);但是,这些模型在其他站点的泛化能力较差。本研究的结果表明,当采用方法学预防措施避免过于乐观的结果时,检测精神病早期个体比最初想象的更具挑战性。有鉴于此,我们认为应该重新考虑当前关于 ML 和结构神经影像学诊断价值的证据,以进行更谨慎的解释。