Lin Mingzhi, Hui Yiming, Li Bin, Zhao Peilin, Zheng Zhizhong, Yang Zhuowen, Su Zhipeng, Meng Yuqi, Song Tieniu
Department of Thoracic Surgery, The Second Hospital & Clinical Medical School, Lanzhou University, Lanzhou 730030, China.
Zhongguo Fei Ai Za Zhi. 2025 Apr 20;28(4):281-290. doi: 10.3779/j.issn.1009-3419.2025.102.13.
Lung cancer is one of the most common malignant tumors worldwide and a major cause of cancer-related deaths. Early-stage lung cancer is often manifested as pulmonary nodules, and accurate assessment of the malignancy risk is crucial for prolonging survival and avoiding overtreatment. This study aims to construct a model based on image feature parameters automatically extracted by artificial intelligence (AI) to evaluate its effectiveness in predicting the malignancy of part-solid nodule (PSN).
This retrospective study analyzed 229 PSN from 222 patients who underwent pulmonary nodule resection at Lanzhou University Second Hospital between October 2020 and February 2025. According to pathological results, 45 cases of benign lesions and precursor glandular lesion were categorized into the non-malignant group, and 184 cases of pulmonary malignancies were categorized into the malignant group. All patients underwent preoperative chest computed tomography (CT), and AI software was used to extract imaging feature parameters. Univariate analysis was used to screen significant variables; variance inflation factor (VIF) was calculated to exclude highly collinear variables, and LASSO regression was further applied to identify key features. Multivariate Logistic regression was used to determine independent risk factors. Based on the selected variables, five models were constructed: Logistic regression, random forest, XGBoost, LightGBM, and support vector machine (SVM). Receiver operating characteristic (ROC) curves were used to assess the performance of the models.
The independent risk factors for the malignancy of PSN include roughness (ngtdm), dependence variance (gldm), and short run low gray-level emphasis (glrlm). Logistic regression achieved area under the curves ( AUCs) of 0.86 and 0.89 in the training and testing sets, respectively, showing good performance. XGBoost had AUCs of 0.78 and 0.77, respectively, demonstrating relatively balanced performance, but with lower accuracy. SVM showed an AUC of 0.93 in the training set, which decreased to 0.80 in the testing set, indicating overfitting. LightGBM performed excellently in the training set with an AUC of 0.94, but its performance declined in the testing set, with an AUC of 0.88. In contrast, random forest demonstrated stable performance in both the training and testing sets, with AUCs of 0.89 and 0.91, respectively, exhibiting high stability and excellent generalizability.
CONCLUSIONS: The random forest model constructed based on independent risk factors demonstrated the best performance in predicting the malignancy of PSN and could provide effective auxiliary predictions for clinicians, supporting individualized treatment decisions. .
肺癌是全球最常见的恶性肿瘤之一,也是癌症相关死亡的主要原因。早期肺癌常表现为肺结节,准确评估恶性风险对于延长生存期和避免过度治疗至关重要。本研究旨在构建一种基于人工智能(AI)自动提取的图像特征参数的模型,以评估其在预测部分实性结节(PSN)恶性程度方面的有效性。
本回顾性研究分析了2020年10月至2025年2月在兰州大学第二医院接受肺结节切除术的222例患者的229个PSN。根据病理结果,45例良性病变和前驱腺性病变被归类为非恶性组,184例肺恶性肿瘤被归类为恶性组。所有患者均接受术前胸部计算机断层扫描(CT),并使用AI软件提取影像特征参数。采用单因素分析筛选显著变量;计算方差膨胀因子(VIF)以排除高度共线性变量,并进一步应用LASSO回归识别关键特征。多因素Logistic回归用于确定独立危险因素。基于选定变量构建了五个模型:Logistic回归、随机森林、XGBoost、LightGBM和支持向量机(SVM)。采用受试者操作特征(ROC)曲线评估模型性能。
PSN恶性的独立危险因素包括粗糙度(ngtdm)、依赖方差(gldm)和短程低灰度强调(glrlm)。Logistic回归在训练集和测试集的曲线下面积(AUC)分别为0.86和0.89,表现良好。XGBoost的AUC分别为0.78和0.77,表现相对平衡,但准确性较低。SVM在训练集的AUC为0.93,在测试集降至0.80,表明存在过拟合。LightGBM在训练集表现出色,AUC为0.94,但在测试集性能下降,AUC为0.88。相比之下,随机森林在训练集和测试集均表现稳定,AUC分别为0.89和0.91,具有高稳定性和出色的泛化能力。
基于独立危险因素构建的随机森林模型在预测PSN恶性程度方面表现最佳,可为临床医生提供有效的辅助预测,支持个体化治疗决策。