Wang Ping, Cui Jianing, Du Haoyuan, Qian Zhanhua, Zhan Huili, Zhang Heng, Ye Wei, Meng Wei, Bai Rongjie
Department of Radiology, Beijing Jishuitan Hospital, Capital Medical University, Beijing 100035, China (P.W., J.C., Z.Q., H.Z., H.Z., W.Y., R.B.).
Department of Orthopaedic Surgery, Beijing Jishuitan Hospital, Capital Medical University, Beijing 100035, China (H.D.).
Acad Radiol. 2025 Mar 25. doi: 10.1016/j.acra.2025.03.005.
Accurate preoperative prediction of spread through air spaces (STAS) in primary lung adenocarcinoma (LUAD) is critical for optimizing surgical strategies and improving patient outcomes.
To develop a machine learning (ML) based model to predict STAS using preoperative CT imaging features and clinicopathological data, while enhancing interpretability through shapley additive explanations (SHAP) analysis.
This multicenter retrospective study included 1237 patients with pathologically confirmed primary LUAD from three hospitals. Patients from Center 1 (n=932) were divided into a training set (n=652) and an internal test set (n=280). Patients from Centers 2 (n=165) and 3 (n=140) formed external validation sets. CT imaging features and clinical variables were selected using Boruta and least absolute shrinkage and selection operator regression. Seven ML models were developed and evaluated using five-fold cross-validation. Performance was assessed using F1 score, recall, precision, specificity, sensitivity, and area under the receiver operating characteristic curve (AUC).
The Extreme Gradient Boosting (XGB) model achieved AUCs of 0.973 (training set), 0.862 (internal test set), and 0.842/0.810 (external validation sets). SHAP analysis identified nodule type, carcinoembryonic antigen, maximum nodule diameter, and lobulated sign as key features for predicting STAS. Logistic regression analysis confirmed these as independent risk factors.
The XGB model demonstrated high predictive accuracy and interpretability for STAS. By integrating widely available clinical and imaging features, this model offers a practical and effective tool for preoperative risk stratification, supporting personalized surgical planning in primary LUAD management.
准确术前预测原发性肺腺癌(LUAD)的气腔播散(STAS)对于优化手术策略和改善患者预后至关重要。
开发一种基于机器学习(ML)的模型,利用术前CT影像特征和临床病理数据预测STAS,同时通过夏普利值加法解释(SHAP)分析增强可解释性。
这项多中心回顾性研究纳入了来自三家医院的1237例经病理证实的原发性LUAD患者。中心1的患者(n = 932)被分为训练集(n = 652)和内部测试集(n = 280)。中心2(n = 165)和中心3(n = 140)的患者组成外部验证集。使用Boruta算法以及最小绝对收缩和选择算子回归选择CT影像特征和临床变量。开发了七个ML模型,并使用五折交叉验证进行评估。使用F1分数、召回率、精确率、特异性、灵敏度和受试者操作特征曲线下面积(AUC)评估性能。
极端梯度提升(XGB)模型在训练集、内部测试集和外部验证集中的AUC分别为0.973、0.862和0.842/0.810。SHAP分析确定结节类型、癌胚抗原、最大结节直径和分叶征为预测STAS的关键特征。逻辑回归分析证实这些为独立危险因素。
XGB模型对STAS显示出高预测准确性和可解释性。通过整合广泛可用的临床和影像特征,该模型为术前风险分层提供了一种实用有效的工具,支持原发性LUAD管理中的个性化手术规划。