Department of Emergency and Organ Transplantation, Rheumatology Unit, University of Bari, Bari, Italy.
Department of Experimental and Clinical Medicine, University of Florence, Florence, Italy.
Front Immunol. 2022 Apr 5;13:860877. doi: 10.3389/fimmu.2022.860877. eCollection 2022.
Inferential statistical methods failed in identifying reliable biomarkers and risk factors for relapsing giant cell arteritis (GCA) after glucocorticoids (GCs) tapering. A ML approach allows to handle complex non-linear relationships between patient attributes that are hard to model with traditional statistical methods, merging them to output a forecast or a probability for a given outcome.
The objective of the study was to assess whether ML algorithms can predict GCA relapse after GCs tapering.
GCA patients who underwent GCs therapy and regular follow-up visits for at least 12 months, were retrospectively analyzed and used for implementing 3 ML algorithms, namely, Logistic Regression (LR), Decision Tree (DT), and Random Forest (RF). The outcome of interest was disease relapse within 3 months during GCs tapering. After a ML variable selection method, based on a XGBoost wrapper, an attribute core set was used to train and test each algorithm using 5-fold cross-validation. The performance of each algorithm in both phases was assessed in terms of accuracy and area under receiver operating characteristic curve (AUROC).
The dataset consisted of 107 GCA patients (73 women, 68.2%) with mean age ( ± SD) 74.1 ( ± 8.5) years at presentation. GCA flare occurred in 40/107 patients (37.4%) within 3 months after GCs tapering. As a result of ML wrapper, the attribute core set with the least number of variables used for algorithm training included presence/absence of diabetes mellitus and concomitant polymyalgia rheumatica as well as erythrocyte sedimentation rate level at GCs baseline. RF showed the best performance, being significantly superior to other algorithms in accuracy (RF 71.4% vs LR 70.4% vs DT 62.9%). Consistently, RF precision (72.1%) was significantly greater than those of LR (62.6%) and DT (50.8%). Conversely, LR was superior to RF and DT in recall (RF 60% vs LR 62.5% vs DT 47.5%). Moreover, RF AUROC (0.76) was more significant compared to LR (0.73) and DT (0.65).
RF algorithm can predict GCA relapse after GCs tapering with sufficient accuracy. To date, this is one of the most accurate predictive modelings for such outcome. This ML method represents a reproducible tool, capable of supporting clinicians in GCA patient management.
推断性统计方法未能识别出糖皮质激素(GCs)减量后复发性巨细胞动脉炎(GCA)的可靠生物标志物和危险因素。机器学习(ML)方法可以处理患者特征之间复杂的非线性关系,这些关系很难用传统的统计方法来建模,将这些关系合并起来可以输出预测结果或给定结果的概率。
本研究旨在评估 ML 算法是否可以预测 GCs 减量后 GCA 的复发。
回顾性分析了接受 GCs 治疗并至少随访 12 个月的 GCA 患者,并将其用于实施 3 种 ML 算法,即逻辑回归(LR)、决策树(DT)和随机森林(RF)。感兴趣的结局是 GCs 减量期间 3 个月内疾病复发。在基于 XGBoost 包装器的 ML 变量选择方法之后,使用 5 折交叉验证,使用属性核心集来训练和测试每个算法。在两个阶段中,根据准确性和接收器操作特征曲线(AUROC)下面积来评估每个算法的性能。
该数据集包含 107 例 GCA 患者(73 例女性,68.2%),发病时的平均年龄( ± 标准差)为 74.1( ± 8.5)岁。GCs 减量后 3 个月内,40/107 例(37.4%)患者出现 GCA 发作。由于 ML 包装器,用于算法训练的变量数量最少的属性核心集包括是否存在糖尿病和伴发性肌痛性风湿症以及 GCs 基线时的红细胞沉降率水平。RF 显示出最佳性能,在准确性方面明显优于其他算法(RF 71.4% vs LR 70.4% vs DT 62.9%)。一致地,RF 精度(72.1%)明显大于 LR(62.6%)和 DT(50.8%)。相反,LR 在召回率方面优于 RF 和 DT(RF 60% vs LR 62.5% vs DT 47.5%)。此外,RF AUROC(0.76)明显优于 LR(0.73)和 DT(0.65)。
RF 算法可以以足够的准确性预测 GCs 减量后 GCA 的复发。到目前为止,这是针对该结果的最准确预测模型之一。这种 ML 方法是一种可重复的工具,能够支持临床医生对 GCA 患者的管理。