Meng Leyuan, Zhu Ping, Xia Kaijian
Department of Respiratory and Critical Care Medicine, Affiliated Hospital of Nantong University, Medical School of Nantong University, Jiangsu, Nantong, China.
Department of Scientific Research, The Changshu Affiliated Hospital of Soochow University, Jiangsu, Suzhou, China.
Front Public Health. 2024 Apr 5;12:1368217. doi: 10.3389/fpubh.2024.1368217. eCollection 2024.
Accurately predicting the extent of lung tumor infiltration is crucial for improving patient survival and cure rates. This study aims to evaluate the application value of an improved CT index combined with serum biomarkers, obtained through an artificial intelligence recognition system analyzing CT features of pulmonary nodules, in early prediction of lung cancer infiltration using machine learning models.
A retrospective analysis was conducted on clinical data of 803 patients hospitalized for lung cancer treatment from January 2020 to December 2023 at two hospitals: Hospital 1 (Affiliated Changshu Hospital of Soochow University) and Hospital 2 (Nantong Eighth People's Hospital). Data from Hospital 1 were used for internal training, while data from Hospital 2 were used for external validation. Five algorithms, including traditional logistic regression (LR) and machine learning techniques (generalized linear models [GLM], random forest [RF], gradient boosting machine [GBM], deep neural network [DL], and naive Bayes [NB]), were employed to construct models predicting early lung cancer infiltration and were analyzed. The models were comprehensively evaluated through receiver operating characteristic curve (AUC) analysis based on LR, calibration curves, decision curve analysis (DCA), as well as global and individual interpretative analyses using variable feature importance and SHapley additive explanations (SHAP) plots.
A total of 560 patients were used for model development in the training dataset, while a dataset comprising 243 patients was used for external validation. The GBM model exhibited the best performance among the five algorithms, with AUCs of 0.931 and 0.99 in the validation and test sets, respectively, and accuracies of 0.857 and 0.955 in the validation and test groups, respectively, outperforming other models. Additionally, the study found that nodule diameter and average CT value were the most significant features for predicting lung cancer infiltration using machine learning models.
The GBM model established in this study can effectively predict the risk of infiltration in early-stage lung cancer patients, thereby improving the accuracy of lung cancer screening and facilitating timely intervention for infiltrative lung cancer patients by clinicians, leading to early diagnosis and treatment of lung cancer, and ultimately reducing lung cancer-related mortality.
准确预测肺肿瘤浸润范围对于提高患者生存率和治愈率至关重要。本研究旨在评估一种改进的CT指标联合血清生物标志物的应用价值,该指标通过人工智能识别系统分析肺结节的CT特征获得,用于利用机器学习模型早期预测肺癌浸润。
对2020年1月至2023年12月在两家医院(苏州大学附属常熟医院1和南通第八人民医院2)因肺癌治疗住院的803例患者的临床资料进行回顾性分析。医院1的数据用于内部训练,医院2的数据用于外部验证。采用包括传统逻辑回归(LR)和机器学习技术(广义线性模型[GLM]、随机森林[RF]、梯度提升机[GBM]、深度神经网络[DL]和朴素贝叶斯[NB])在内的五种算法构建预测早期肺癌浸润的模型并进行分析。通过基于LR的受试者操作特征曲线(AUC)分析、校准曲线、决策曲线分析(DCA)以及使用可变特征重要性和SHapley加性解释(SHAP)图进行的全局和个体解释性分析对模型进行综合评估。
训练数据集中共有560例患者用于模型开发,而包含243例患者的数据集用于外部验证。GBM模型在五种算法中表现最佳,在验证集和测试集中的AUC分别为0.931和0.99,在验证组和测试组中的准确率分别为0.857和0.955,优于其他模型。此外,研究发现结节直径和平均CT值是使用机器学习模型预测肺癌浸润的最显著特征。
本研究建立的GBM模型能够有效预测早期肺癌患者的浸润风险,从而提高肺癌筛查的准确性,并便于临床医生对浸润性肺癌患者及时进行干预,实现肺癌的早期诊断和治疗,最终降低肺癌相关死亡率。