Smucny Jason, Cannon Tyrone D, Bearden Carrie E, Addington Jean, Cadenhead Kristen S, Cornblatt Barbara A, Keshavan Matcheri, Mathalon Daniel H, Perkins Diana O, Stone William, Walker Elaine F, Woods Scott W, Davidson Ian, Carter Cameron S
Department of Psychiatry, University of California, Davis, Davis, CA, United States.
Department of Psychology, Yale University, New Haven, CT, United States.
Front Psychiatry. 2025 Jan 15;15:1520173. doi: 10.3389/fpsyt.2024.1520173. eCollection 2024.
We previously reported that machine learning could be used to predict conversion to psychosis in individuals at clinical high risk (CHR) for psychosis with up to 90% accuracy using the North American Prodrome Longitudinal Study-3 (NAPLS-3) dataset. A definitive test of our predictive model that was trained on the NAPLS-3 data, however, requires further support through implementation in an independent dataset. In this report we tested for model generalization using the previous iteration of NAPLS-3, the NAPLS-2, using the identical machine learning algorithms employed in our previous study.
Standard machine learning algorithms were trained to predict conversion to psychosis in clinical high risk individuals on the NAPLS-3 dataset and tested on the NAPLS-2 dataset.
NAPLS-2 and -3 individuals significantly differed on most features used in machine learning models. All models performed above chance, with Naive Bayes and random forest methods showing the best overall performance. Importantly, however, overall performance did not match those previously observed when using only NAPLS-3 data.
The results of this study suggest that a machine learning model trained to predict conversion to psychosis on one dataset can be used to train an independent dataset. Performance on the test set was not in the range necessary for clinical application, however. Possible reasons that limited performance are discussed.
我们之前报道过,使用北美前驱期纵向研究-3(NAPLS-3)数据集,机器学习可用于预测临床高危(CHR)精神病个体发展为精神病的情况,准确率高达90%。然而,对我们基于NAPLS-3数据训练的预测模型进行确定性测试,需要在独立数据集中实施以获得进一步支持。在本报告中,我们使用与之前研究相同的机器学习算法,通过NAPLS-3的前一版本即NAPLS-2来测试模型的泛化能力。
使用标准机器学习算法在NAPLS-3数据集上训练,以预测临床高危个体发展为精神病的情况,并在NAPLS-2数据集上进行测试。
NAPLS-2和-3个体在机器学习模型使用的大多数特征上存在显著差异。所有模型的表现均高于随机水平,朴素贝叶斯和随机森林方法总体表现最佳。然而,重要的是,总体表现与之前仅使用NAPLS-3数据时观察到的情况不匹配。
本研究结果表明,在一个数据集上训练的用于预测发展为精神病的机器学习模型可用于训练独立数据集。然而,测试集上的表现未达到临床应用所需的范围。文中讨论了限制表现的可能原因。