Division of Rheumatology, Allergy, and Immunology, Harvard Medical School, 2348Massachusetts General Hospital, Boston, MA, USA.
Department of Computer and Information Sciences, 5923Fordham University, New York, NY, USA.
Lupus. 2022 Oct;31(11):1296-1305. doi: 10.1177/09612033221114805. Epub 2022 Jul 14.
Systemic lupus erythematosus (SLE) is a heterogeneous disease characterized by disease flares which can require hospitalization. Our objective was to apply machine learning methods to predict hospitalizations for SLE from electronic health record (EHR) data.
We identified patients with SLE in a longitudinal EHR-based cohort with ≥2 outpatient rheumatology visits between 2012 and 2019. We applied multiple machine learning methods to predict hospitalizations with a primary diagnosis code for SLE, including decision tree, random forest, naive Bayes, logistic regression, and an ensemble method. Candidate predictors were derived from structured EHR features, including demographics, laboratory tests, medications, ICD-9/10 codes for SLE manifestations, and healthcare utilization. We used two approaches to assess these variables over longitudinal follow-up, including the incorporation of lagged features to capture changes over time of clinical data. The performance of each model was evaluated by overall accuracy, the F statistic, and the area under the receiver operator curve (AUC).
We identified 1996 patients with SLE. 4.6% were hospitalized for SLE in their most recent year of follow-up. Random forest models had highest performance in predicting SLE hospitalizations, with AUC 0.751 and AUC 0.772 for two approaches (averaging and progressive), respectively. The leading predictors of SLE hospitalizations included dsDNA positivity, C3 level, blood cell counts, and inflammatory markers as well as age and albumin.
We have demonstrated that machine learning methods can predict SLE hospitalizations. We identified key predictors of these events including known markers of SLE disease activity; further validation in external cohorts is warranted.
系统性红斑狼疮(SLE)是一种异质性疾病,其特征为疾病发作,可能需要住院治疗。我们的目的是应用机器学习方法从电子健康记录(EHR)数据中预测 SLE 的住院情况。
我们在一个基于 EHR 的纵向队列中识别出了至少有 2 次风湿科门诊就诊的 SLE 患者。我们应用了多种机器学习方法来预测 SLE 的主要诊断代码为住院的情况,包括决策树、随机森林、朴素贝叶斯、逻辑回归和集成方法。候选预测因子来自于结构化的 EHR 特征,包括人口统计学、实验室检查、药物、SLE 表现的 ICD-9/10 代码和医疗保健利用情况。我们使用两种方法来评估这些变量在纵向随访中的变化,包括纳入滞后特征以捕捉临床数据随时间的变化。通过整体准确性、F 统计量和接收器操作曲线(AUC)下面积来评估每个模型的性能。
我们确定了 1996 例 SLE 患者。在最近一年的随访中,有 4.6%的患者因 SLE 住院。随机森林模型在预测 SLE 住院方面表现最佳,两种方法(平均和渐进)的 AUC 分别为 0.751 和 0.772。SLE 住院的主要预测因子包括 dsDNA 阳性、C3 水平、血细胞计数和炎症标志物以及年龄和白蛋白。
我们已经证明了机器学习方法可以预测 SLE 的住院情况。我们确定了这些事件的关键预测因子,包括已知的 SLE 疾病活动标志物;需要在外部队列中进一步验证。