Department of Radiology, Mayo Clinic, 200 First Street SW, Rochester, MN, 55905, USA.
Mayo Clinic Alix School of Medicine, Mayo Clinic, Jacksonville, FL, USA.
Eur Radiol. 2022 Dec;32(12):8152-8161. doi: 10.1007/s00330-022-08875-4. Epub 2022 Jun 9.
To evaluate quantitative computed tomography (QCT) features and QCT feature-based machine learning (ML) models in classifying interstitial lung diseases (ILDs). To compare QCT-ML and deep learning (DL) models' performance.
We retrospectively identified 1085 patients with pathologically proven usual interstitial pneumonitis (UIP), nonspecific interstitial pneumonitis (NSIP), and chronic hypersensitivity pneumonitis (CHP) who underwent peri-biopsy chest CT. Kruskal-Wallis test evaluated QCT feature associations with each ILD. QCT features, patient demographics, and pulmonary function test (PFT) results trained eXtreme Gradient Boosting (training/validation set n = 911) yielding 3 models: M1 = QCT features only; M2 = M1 plus age and sex; M3 = M2 plus PFT results. A DL model was also developed. ML and DL model areas under the receiver operating characteristic curve (AUC) and 95% confidence intervals (CIs) were compared for multiclass (UIP vs. NSIP vs. CHP) and binary (UIP vs. non-UIP) classification performances.
The majority (69/78 [88%]) of QCT features successfully differentiated the 3 ILDs (adjusted p ≤ 0.05). All QCT-ML models achieved higher AUC than the DL model (multiclass AUC micro-averages 0.910, 0.910, 0.925, and 0.798 and macro-averages 0.895, 0.893, 0.925, and 0.779 for M1, M2, M3, and DL respectively; binary AUC 0.880, 0.899, 0.898, and 0.869 for M1, M2, M3, and DL respectively). M3 demonstrated statistically significant better performance compared to M2 (∆AUC: 0.015, CI: [0.002, 0.029]) for multiclass prediction.
QCT features successfully differentiated pathologically proven UIP, NSIP, and CHP. While QCT-based ML models outperformed a DL model for classifying ILDs, further investigations are warranted to determine if QCT-ML, DL, or a combination will be superior in ILD classification.
• Quantitative CT features successfully differentiated pathologically proven UIP, NSIP, and CHP. • Our quantitative CT-based machine learning models demonstrated high performance in classifying UIP, NSIP, and CHP histopathology, outperforming a deep learning model. • While our quantitative CT-based machine learning models performed better than a DL model, additional investigations are needed to determine whether either or a combination of both approaches delivers superior diagnostic performance.
评估定量计算机断层扫描(QCT)特征和基于 QCT 的机器学习(ML)模型在间质性肺病(ILDs)分类中的应用。比较 QCT-ML 和深度学习(DL)模型的性能。
我们回顾性地确定了 1085 名经病理证实的特发性间质性肺炎(UIP)、非特异性间质性肺炎(NSIP)和慢性过敏性肺炎(CHP)患者,这些患者在活检前均进行了胸部 CT 检查。Kruskal-Wallis 检验评估了 QCT 特征与每种 ILD 的相关性。QCT 特征、患者人口统计学特征和肺功能测试(PFT)结果用于训练极端梯度提升(训练/验证集 n = 911),产生 3 个模型:M1 = QCT 特征仅;M2 = M1 加年龄和性别;M3 = M2 加 PFT 结果。还开发了一个 DL 模型。比较了 ML 和 DL 模型的受试者工作特征曲线(ROC)下面积(AUC)和 95%置信区间(CI),用于多类(UIP 与 NSIP 与 CHP)和双类(UIP 与非 UIP)分类性能。
大多数(69/78 [88%])QCT 特征成功地区分了 3 种 ILD(调整后的 p ≤ 0.05)。所有 QCT-ML 模型的 AUC 均高于 DL 模型(多类 AUC 微观平均值 0.910、0.910、0.925 和 0.798 以及宏观平均值 0.895、0.893、0.925 和 0.779 分别为 M1、M2、M3 和 DL;二进制 AUC 0.880、0.899、0.898 和 0.869 分别为 M1、M2、M3 和 DL)。与 M2 相比,M3 在多类预测中表现出统计学上显著更好的性能(∆AUC:0.015,CI:[0.002,0.029])。
QCT 特征成功地区分了病理证实的 UIP、NSIP 和 CHP。虽然基于 QCT 的 ML 模型在分类 ILD 方面优于 DL 模型,但需要进一步研究以确定 QCT-ML、DL 或两者的组合是否在 ILD 分类中具有优势。
• QCT 特征成功地区分了病理证实的 UIP、NSIP 和 CHP。• 我们基于 QCT 的机器学习模型在 UIP、NSIP 和 CHP 组织病理学分类中表现出了很高的性能,优于深度学习模型。• 虽然我们基于 QCT 的机器学习模型的性能优于 DL 模型,但需要进一步研究,以确定哪种方法或两者的组合在诊断性能方面更优。