Li Hui, Zou Ruyi, Xin Hongxia, He Ping, Xi Bin, Tian Yaqiong, Zhao Qi, Yan Xin, Qiu Xiaohua, Gao Yujuan, Liu Yin, Cao Min, Chen Bi, Han Qian, Chen Juan, Wang Guochun, Cai Hourong
Department of Respiratory and Critical Care Medicine, Nanjing Drum Tower Hospital, Affiliated Hospital of Medical School, Nanjing University, Nanjing, China.
Department of Pulmonary and Critical Care Medicine, Seventh Affiliated Hospital, Sun Yatsen University, Shenzhen, China.
J Med Internet Res. 2025 Feb 5;27:e62836. doi: 10.2196/62836.
Patients with antimelanoma differentiation-associated gene 5 antibody-positive dermatomyositis-associated interstitial lung disease (anti-MDA5+DM-ILD) are susceptible to rapidly progressive interstitial lung disease (RP-ILD) and have a high risk of mortality. There is an urgent need for a reliable prediction model, accessible via an easy-to-use web-based tool, to evaluate the risk of death.
This study aimed to develop and validate a risk prediction model of 3-month mortality using machine learning (ML) in a large multicenter cohort of patients with anti-MDA5+DM-ILD in China.
In total, 609 consecutive patients with anti-MDA5+DM-ILD were retrospectively enrolled from 6 hospitals across China. Patient demographics and laboratory and clinical parameters were collected on admission. The primary endpoint was 3-month mortality due to all causes. Six ML algorithms (Extreme Gradient Boosting [XGBoost], logistic regression (LR), Light Gradient Boosting Machine [LightGBM], random forest [RF], support vector machine [SVM], and k-nearest neighbor [KNN]) were applied to construct and evaluate the model.
After applying inclusion and exclusion criteria, 509 (83.6%) of the 609 patients were included in our study, divided into a training cohort (n=203, 39.9%), an internal validation cohort (n=51, 10%), and 2 external validation cohorts (n=92, 18.1%, and n=163, 32%). ML identified 8 important variables as critical for model construction: RP-ILD, erythrocyte sedimentation rate (ESR), serum albumin (ALB) level, age, C-reactive protein (CRP) level, aspartate aminotransferase (AST) level, lactate dehydrogenase (LDH) level, and the neutrophil-to-lymphocyte ratio (NLR). LR was chosen as the best algorithm for model construction, and the model demonstrated excellent performance, with an area under the receiver operating characteristic (ROC) curve (AUC) of 0.866, a sensitivity of 84.8%, and a specificity of 84.4% on the validation data set and an AUC of 0.90, a sensitivity of 85.0%, and a specificity of 83.9% on the training data set. Calibration curves and decision curve analysis (DCA) confirmed the model's accuracy and clinical applicability. Moreover, the model showed strong predictive performance in the external validation cohorts (cohort 1: AUC=0.836, 95% CI 0.754-0.916; cohort 2: AUC=0.915, 95% CI 0.871-0.959), indicating good generalizability. This model was integrated into a web-based tool to predict the 3-month mortality for patients with anti-MDA5+DM-ILD.
We successfully developed a robust clinical prediction model and an accompanying web tool to estimate the 3-month mortality risk for patients with anti-MDA5+DM-ILD.
抗黑色素瘤分化相关基因5抗体阳性的皮肌炎相关间质性肺病(抗MDA5+DM-ILD)患者易患快速进展性间质性肺病(RP-ILD),且死亡风险高。迫切需要一种可靠的预测模型,可通过易于使用的基于网络的工具来评估死亡风险。
本研究旨在在中国一个大型多中心抗MDA5+DM-ILD患者队列中,使用机器学习(ML)开发并验证一个3个月死亡率的风险预测模型。
共回顾性纳入了来自中国6家医院的609例连续的抗MDA5+DM-ILD患者。收集患者入院时的人口统计学资料以及实验室和临床参数。主要终点是全因3个月死亡率。应用六种ML算法(极端梯度提升[XGBoost]、逻辑回归[LR]、轻量级梯度提升机[LightGBM]、随机森林[RF]、支持向量机[SVM]和k近邻[KNN])来构建和评估模型。
应用纳入和排除标准后,609例患者中的509例(83.6%)纳入本研究,分为训练队列(n=203,39.9%)、内部验证队列(n=51,10%)和2个外部验证队列(n=92,18.1%和n=163,32%)。ML识别出8个重要变量对模型构建至关重要:RP-ILD、红细胞沉降率(ESR)、血清白蛋白(ALB)水平、年龄、C反应蛋白(CRP)水平、天冬氨酸转氨酶(AST)水平、乳酸脱氢酶(LDH)水平和中性粒细胞与淋巴细胞比值(NLR)。LR被选为模型构建的最佳算法,该模型表现出色,在验证数据集上的受试者操作特征(ROC)曲线下面积(AUC)为0.866,灵敏度为84.8%,特异度为84.4%;在训练数据集上的AUC为0.90,灵敏度为85.0%,特异度为83.9%。校准曲线和决策曲线分析(DCA)证实了模型的准确性和临床适用性。此外,该模型在外部验证队列中显示出强大的预测性能(队列1:AUC=0.836,95%CI 0.754-0.916;队列2:AUC=0.915,95%CI 0.871-0.959),表明具有良好的通用性。该模型被整合到一个基于网络的工具中,以预测抗MDA5+DM-ILD患者的3个月死亡率。
我们成功开发了一个强大的临床预测模型及配套的网络工具,以估计抗MDA5+DM-ILD患者的3个月死亡风险。