Xu Jiaojiao, Zhang Wei, Bai Weili, Gai Nannan, Li Jing, Bao Yunqi
Department of Rheumatology, Xi'an Fifth Hospital, 112 Xiguanzheng Street, Lianhu District, Xian, Shaanxi, 710000, People's Republic of China.
BMC Pulm Med. 2025 Aug 14;25(1):394. doi: 10.1186/s12890-025-03855-y.
Interstitial lung disease (ILD) is a severe complication affecting 10-30% of rheumatoid arthritis (RA) patients. Current diagnostic methods typically detect ILD only after substantial lung damage has occurred. This delay emphasizes the need for early detection strategies. This study aims to develop and validate machine learning models for early RA-ILD prediction and identify key predictive biomarkers.
We conducted a cross-sectional study enrolling 149 RA patients (84 with ILD, 65 without ILD) between January 2020 and December 2023. We evaluated demographic characteristics, clinical parameters, and laboratory markers, including inflammatory indicators, hematological parameters, and specific biomarkers. We developed and compared four machine learning (ML) models (XGBoost, Random Forest, Support Vector Machine, and Logistic Regression) for ILD prediction capabilities.
The XGBoost model demonstrated superior predictive performance (AUC = 0.891, 95% CI: 0.847-0.935). Feature importance analysis identified Krebs von den Lungen-6 (KL-6) as the strongest predictor (importance score = 0.285), followed by interleukin-6 (IL-6) and cytokeratin 19 fragment (CYFRA21-1). The ILD group exhibited significantly elevated levels of inflammatory markers and specific biomarkers, particularly KL-6 (826.4 ± 458.2 vs. 285.6 ± 124.8 U/ml, P < 0.001), alongside distinct patterns in hematological parameters.
Machine learning approaches, particularly XGBoost, demonstrate promising potential for early RA-ILD prediction. The integration of KL-6 and other identified biomarkers into clinical screening protocols may facilitate early detection and improved patient outcomes. These findings suggest that machine learning models could serve as valuable tools for risk stratification and early intervention in RA-ILD management, providing new approaches for individualized risk assessment in clinical practice.
间质性肺病(ILD)是一种严重的并发症,影响10%至30%的类风湿关节炎(RA)患者。目前的诊断方法通常仅在肺部发生实质性损伤后才能检测到ILD。这种延迟凸显了早期检测策略的必要性。本研究旨在开发和验证用于早期RA-ILD预测的机器学习模型,并识别关键的预测生物标志物。
我们进行了一项横断面研究,在2020年1月至2023年12月期间招募了149例RA患者(84例患有ILD,65例未患ILD)。我们评估了人口统计学特征、临床参数和实验室指标,包括炎症指标、血液学参数和特定生物标志物。我们开发并比较了四种用于ILD预测能力的机器学习(ML)模型(XGBoost、随机森林、支持向量机和逻辑回归)。
XGBoost模型表现出卓越的预测性能(AUC = 0.891,95% CI:0.847 - 0.935)。特征重要性分析确定,胃泌素释放肽前体(KL-6)是最强的预测因子(重要性得分 = 0.285),其次是白细胞介素-6(IL-6)和细胞角蛋白19片段(CYFRA21-1)。ILD组的炎症标志物和特定生物标志物水平显著升高,尤其是KL-6(826.4 ± 458.2 vs. 285.6 ± 124.8 U/ml,P < 0.001),同时血液学参数也呈现出不同的模式。
机器学习方法,尤其是XGBoost,在早期RA-ILD预测方面显示出有前景的潜力。将KL-6和其他已识别的生物标志物纳入临床筛查方案可能有助于早期检测并改善患者预后。这些发现表明,机器学习模型可作为RA-ILD管理中风险分层和早期干预的有价值工具,为临床实践中的个性化风险评估提供新方法。