Felix Christina, Johnston Joshua D, Owen Kelsey, Shirima Emil, Hinds Sidney R, Mandl Kenneth D, Milinovich Alex, Alberts Jay L
Neurological Institute, Cleveland Clinic, Cleveland, OH, USA.
Department of Biomedical Engineering, Cleveland Clinic, Cleveland, OH, USA.
Digit Health. 2024 Apr 28;10:20552076241249286. doi: 10.1177/20552076241249286. eCollection 2024 Jan-Dec.
This study assesses the application of interpretable machine learning modeling using electronic medical record data for the prediction of conversion to neurological disease.
A retrospective dataset of Cleveland Clinic patients diagnosed with Alzheimer's disease, amyotrophic lateral sclerosis, multiple sclerosis, or Parkinson's disease, and matched controls based on age, sex, race, and ethnicity was compiled. Individualized risk prediction models were created using eXtreme Gradient Boosting for each neurological disease at four timepoints in patient history. The prediction models were assessed for transparency and fairness.
At timepoints 0-months, 12-months, 24-months, and 60-months prior to diagnosis, Alzheimer's disease models achieved the area under the receiver operating characteristic curve on a holdout test dataset of 0.794, 0.742, 0.709, and 0.645; amyotrophic lateral sclerosis of 0.883, 0.710, 0.658, and 0.620; multiple sclerosis of 0.922, 0.877, 0.849, and 0.781; and Parkinson's disease of 0.809, 0.738, 0.700, and 0.651, respectively.
The results demonstrate that electronic medical records contain latent information that can be used for risk stratification for neurological disorders. In particular, patient-reported outcomes, sleep assessments, falls data, additional disease diagnoses, and longitudinal changes in patient health, such as weight change, are important predictors.
本研究评估使用电子病历数据的可解释机器学习建模在预测神经系统疾病转化方面的应用。
汇编了克利夫兰诊所被诊断患有阿尔茨海默病、肌萎缩侧索硬化症、多发性硬化症或帕金森病的患者以及基于年龄、性别、种族和民族匹配的对照的回顾性数据集。在患者病史的四个时间点,使用极端梯度提升为每种神经系统疾病创建个性化风险预测模型。对预测模型的透明度和公平性进行评估。
在诊断前0个月、12个月、24个月和60个月的时间点,阿尔茨海默病模型在保留测试数据集上的受试者操作特征曲线下面积分别为0.794、0.742、0.709和0.645;肌萎缩侧索硬化症分别为0.883、0.710、0.658和0.620;多发性硬化症分别为0.922、0.877、0.849和0.781;帕金森病分别为0.809、0.738、0.700和0.651。
结果表明电子病历包含可用于神经系统疾病风险分层的潜在信息。特别是,患者报告的结果、睡眠评估、跌倒数据、额外疾病诊断以及患者健康的纵向变化,如体重变化,都是重要的预测因素。