Suppr超能文献

用于预测肺腺癌恶性程度和指导治疗的机器学习算法:基于CT影像组学的比较

Machine learning algorithms for predicting malignancy grades of lung adenocarcinoma and guiding treatments: CT radiomics-based comparisons.

作者信息

Zhu Jun, Tao Jiayu, Zhang Fengfeng, Yao Jie, Chen Ke, Wang Yuxuan, Lu Xiaochen, Ni Bin, Zhu Maoshan

机构信息

Department of Thoracic Surgery, the First Affiliated Hospital of Soochow University, Suzhou, China.

Department of Oncology, the First Affiliated Hospital of Soochow University, Suzhou, China.

出版信息

J Thorac Dis. 2025 Apr 30;17(4):2423-2440. doi: 10.21037/jtd-2025-310. Epub 2025 Apr 28.

Abstract

BACKGROUND

Lung adenocarcinoma (LUAD) is the most frequently diagnosed subtype of non-small cell lung cancer (NSCLC). Notably, prognosis can vary significantly among LUAD patients with different tumor subtypes. The advent of radiomics and machine learning (ML) technologies enables the development of non-invasive pathology predictive models. We attempted to develop computed tomography (CT) radiomics-based diagnostic models, enhanced by ML, to predict LUAD malignancy grade and guide surgical strategies.

METHODS

In this retrospective analysis, a total of 168 surgical patients with histology-confirmed LUAD were divided into low-risk group (n=93) and intermediate-to-high-risk group (n=75) based on postoperative pathology. The region of interest (ROI) was delineated on the preoperative CT images for all patients, followed by the extraction of radiomic features. Patients were randomly allocated to a training set (n=117) and a testing set (n=51) in a 7:3 ratio. Within the training set, clinical-radiological model (CM) and radiomics model (RM) were developed utilizing patients' clinical characteristics, radiological semantic features, and radiomic features, along with the calculation of Rad scores. After the Rad scores were combined with independent risk factors among clinical-radiological features, logistic regression (LR), decision tree (DT), random forest (RF), extreme gradient boosting (XGBoost), support vector machine (SVM), K-nearest neighbors (KNN), and naïve Bayes model (NBM) were employed to create different comprehensive models (COMs). The optimal model was identified based on the receiver operating characteristic (ROC) curves and the DeLong test. Finally, Shapley additive explanations (SHAP) were utilized to visualize the predictive processes of the models.

RESULTS

Among the 168 patients enrolled, there were 50 males (29.76%) aged 56 (49.25, 67.00) years and 118 females (70.24%) aged 56.5 (42.00, 64.00) years; Diameter (P<0.001), and consolidation-to-tumor ratio (CTR) ≥0.5 (P=0.002) were identified as independent risk factors for the malignancy degree of LUAD during CM creation. The CM had an area under the ROC curve (AUC) of 0.909 [95% confidence interval (CI): 0.856-0.962] in the training set and 0.920 (95% CI: 0.846-0.994) in the testing set. The RM, comprising seven radiomic features, achieved an AUC of 0.961 (95% CI: 0.926-0.996) in the training set and 0.957 (95% CI: 0.905-1.000) in the testing set. Among models created using various ML algorithms, the XGBoost model was identified as the optimal model. SHAP visualization revealed the model prediction processes and the values of different features.

CONCLUSIONS

We constructed and validated a robust, integrative model leveraging ML and CT radiomics, which amalgamates radiomics, clinical, and radiological attributes to precisely identify LUADs with elevated postoperative pathological grades. This enables doctors to formulate different surgical plans according to the pathology of the patients' tumors before the operation.

摘要

背景

肺腺癌(LUAD)是最常见的非小细胞肺癌(NSCLC)亚型。值得注意的是,不同肿瘤亚型的LUAD患者预后差异显著。放射组学和机器学习(ML)技术的出现推动了非侵入性病理预测模型的发展。我们试图开发基于计算机断层扫描(CT)放射组学的诊断模型,并通过ML进行强化,以预测LUAD的恶性程度并指导手术策略。

方法

在这项回顾性分析中,根据术后病理结果,将168例经组织学确诊的LUAD手术患者分为低风险组(n = 93)和中高风险组(n = 75)。为所有患者在术前CT图像上划定感兴趣区域(ROI),随后提取放射组学特征。患者按7:3的比例随机分配到训练集(n = 117)和测试集(n = 51)。在训练集中,利用患者的临床特征、放射学语义特征和放射组学特征,开发临床-放射学模型(CM)和放射组学模型(RM),并计算Rad评分。将Rad评分与临床-放射学特征中的独立危险因素相结合后,采用逻辑回归(LR)、决策树(DT)、随机森林(RF)、极端梯度提升(XGBoost)、支持向量机(SVM)、K近邻(KNN)和朴素贝叶斯模型(NBM)创建不同的综合模型(COM)。根据受试者工作特征(ROC)曲线和德龙检验确定最佳模型。最后,利用Shapley加性解释(SHAP)来可视化模型的预测过程。

结果

在纳入的168例患者中,有50例男性(29.76%),年龄56(49.25,67.00)岁,118例女性(70.24%),年龄56.5(42.00,64.00)岁;直径(P < 0.001)和实变与肿瘤比值(CTR)≥0.5(P = 0.002)在创建CM时被确定为LUAD恶性程度的独立危险因素。CM在训练集中的ROC曲线下面积(AUC)为0.909 [95%置信区间(CI):0.856 - 0.962],在测试集中为0.920(95% CI:0.846 - 0.994)。由七个放射组学特征组成的RM在训练集中的AUC为0.961(95% CI:0.926 - 0.996),在测试集中为0.957(95% CI:0.905 - 1.000)。在使用各种ML算法创建的模型中,XGBoost模型被确定为最佳模型。SHAP可视化揭示了模型预测过程和不同特征的值。

结论

我们构建并验证了一个强大的、整合的模型,该模型利用ML和CT放射组学,融合了放射组学、临床和放射学属性,以精确识别术后病理分级升高的LUAD。这使医生能够在手术前根据患者肿瘤的病理情况制定不同的手术方案。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验