Lilli Livia, Antenucci Laura, Ortolan Augusta, Bosello Silvia Laura, Patarnello Stefano, Masciocchi Carlotta, Gorini Marco, Castellino Gabriella, Cesario Alfredo, D'Agostino Maria Antonietta, Lenkowicz Jacopo
Fondazione Policlinico Universitario Agostino Gemelli IRCCS, Largo Agostino Gemelli, 8, Rome, 00168, Italy.
Catholic University of the Sacred Heart, Rome, Italy.
JMIR Form Res. 2025 Aug 22;9:e70200. doi: 10.2196/70200.
Systemic lupus erythematosus (SLE) is a chronic disease characterized by a broad spectrum of involved organs, including neurological, renal, and vascular domains, with disease activity manifesting through unpredictable patterns that vary across individuals and over time, making the prediction of activity events particularly challenging.
This paper proposes a hierarchical machine learning model to predict a 12-month SLE activity, defined as the occurrence of at least one event among SLE hospitalization, new organ-involved domain, and neurological, renal, or vascular manifestation within the following year. At each patient's visit, the model considers all the features at the current time point, the information about the patient's clinical history, and about its last 12 months, to predict the outcome for the next 12 months.
The study cohort consists of 262 patients with at least an outpatient visit and an SLE admission from 2012 to 2020, at the Italian Gemelli Hospital, comprising a retrospective longitudinal dataset of 5962 contacts. The data include demographics, laboratory, clinical features (eg, domain involvements and manifestations), treatments, and pathways (eg, contact types as outpatients, hospitalizations, day hospitals, and visit frequency). The variables consider 3 time ranges: features about the current contact and the last 12 months, and the previous patient's clinical history. The main model was developed by testing different machine learning approaches within a cross-validation setup. The predicted probability outputs were used in a risk stratification analysis, identifying 3 groups of predictions: strong, moderate, and mild. Mild samples were then passed through a second cascade model. The integration of the main model (applied to strong and moderate samples) with the cascade model (applied to mild contacts) forms our final hierarchical model.
The hierarchical model, resulting from the ensemble of the main random forest and cascade decision tree, demonstrated enhanced performance, increasing the area under the receiver operating characteristic curve from 0.696 (95% CI 0.672-0.719) in the original main model to 0.743 (95% CI 0.717-0.769), particularly for specific patient characteristics. Through the application of explainable artificial intelligence methods, we also identified the key features that significantly influence the model's predictions. Among the 185 collected features, 15 emerged as the most impactful, including age at contact, response to therapy modifications, abnormal laboratory tests, and clinical manifestations. This analysis plays a crucial role in enhancing model transparency, which is essential for fostering the adoption of artificial intelligence in health care settings.
Our study introduces an explainable and reliable tool for predicting 1-year SLE activity, supporting physicians with an advanced decision-support system to improve patient management. The model identifies key features that may help characterize patient phenotypes, enabling personalized treatment plans and better outcomes. In addition, the methodology can be generalized for predictive analytics in other chronic autoimmune diseases.
系统性红斑狼疮(SLE)是一种慢性疾病,其特征是累及多个器官,包括神经、肾脏和血管领域,疾病活动表现出不可预测的模式,因人而异且随时间变化,这使得预测活动事件极具挑战性。
本文提出一种分层机器学习模型,用于预测12个月的SLE活动,定义为在接下来的一年中,SLE住院、新的器官受累领域以及神经、肾脏或血管表现中至少发生一次事件。在每次患者就诊时,该模型会考虑当前时间点的所有特征、患者的临床病史信息以及过去12个月的信息,以预测未来12个月的结果。
研究队列包括2012年至2020年期间在意大利杰梅利医院至少有一次门诊就诊和一次SLE住院的262名患者,构成了一个包含5962次接触的回顾性纵向数据集。数据包括人口统计学、实验室检查、临床特征(如受累领域和表现)、治疗方法以及就医途径(如门诊、住院、日间医院等接触类型和就诊频率)。这些变量考虑了3个时间范围:当前接触和过去12个月的特征以及患者之前临床病史。主要模型是通过在交叉验证设置中测试不同的机器学习方法而开发的。预测概率输出用于风险分层分析,确定3组预测结果:强、中、弱。然后将弱样本通过第二个级联模型。主模型(应用于强和中样本)与级联模型(应用于弱接触)的整合形成了我们最终的分层模型。
由主随机森林和级联决策树集成得到的分层模型表现出更好的性能,将受试者工作特征曲线下面积从原始主模型中的0.696(95%CI 0.672 - 0.719)提高到0.743(95%CI 0.717 - 0.769),特别是对于特定患者特征。通过应用可解释人工智能方法,我们还确定了对模型预测有显著影响的关键特征。在收集的185个特征中,有15个被认为是最具影响力的,包括接触时的年龄、对治疗调整的反应、异常实验室检查结果和临床表现。这一分析在提高模型透明度方面起着关键作用,而模型透明度对于在医疗环境中推广人工智能至关重要。
我们的研究引入了一种可解释且可靠的工具来预测1年的SLE活动,为医生提供了一个先进的决策支持系统,以改善患者管理。该模型识别出可能有助于表征患者表型的关键特征,从而实现个性化治疗方案并取得更好的治疗效果。此外,该方法可推广用于其他慢性自身免疫性疾病的预测分析。