Wang Wanling, Yang Bingqing, Wu Huan, Che Hebin, Tong Yue, Zhang Bozun, Liu Hongwu, Chen Yuanyuan
Medical Innovation Research Department of PLA General Hospital, Beijing, People's Republic of China.
Goodwill Hessian Health Technology Co. Ltd, Beijing, People's Republic of China.
J Multidiscip Healthc. 2025 Jun 27;18:3735-3748. doi: 10.2147/JMDH.S518166. eCollection 2025.
Lung cancer, one of the most lethal malignancies globally, often presents insidiously as pulmonary nodules. Its nonspecific clinical presentation and heterogeneous imaging characteristics hinder accurate differentiation between benign and malignant lesions, while biopsy's invasiveness and procedural constraints underscore the critical need for non-invasive early diagnostic approaches.
In this retrospective study, we analyzed outpatient and inpatient records from the First Medical Center of Chinese PLA General Hospital between 2011 and 2021, focusing on pulmonary nodules measuring 5-30mm on CT scans without overt signs of malignancy. Pathological examination served as the reference standard. Comparative experiments evaluated SVM, RF, XGBoost, FNN, and Atten_FNN using five-fold cross-validation to assess AUC, sensitivity, and specificity. The dataset was split 70%/30%, and stratified five-fold cross-validation was applied to the training set. The optimal model was interpreted with SHAP to identify the most influential predictive features.
This study enrolled 3355 patients, including 1156 with benign and 2199 with malignant pulmonary nodules. The Atten_FNN model demonstrated superior performance in five-fold cross-validation, achieving an AUC of 0.82, accuracy of 0.75, sensitivity of 0.77, and F1 score of 0.80. SHAP analysis revealed key predictive factors: demographic variables (age, sex, BMI), CT-derived features (maximum nodule diameter, morphology, density, calcification, ground-glass opacity), and laboratory biomarkers (neuroendocrine markers, carcinoembryonic antigen).
This study integrates electronic medical records and pathology data to predict pulmonary nodule malignancy using machine/deep learning models. SHAP-based interpretability analysis uncovered key clinical determinants. Acknowledging limitations in cross-center generalizability, we propose the development of a multimodal diagnostic systems that combines CT imaging and radiomics, to be validated in multi-center prospective cohorts to facilitate clinical translation. This framework establishes a novel paradigm for early precision diagnosis of lung cancer.
肺癌是全球最致命的恶性肿瘤之一,常隐匿表现为肺结节。其非特异性临床表现和异质性影像学特征阻碍了良性和恶性病变的准确鉴别,而活检的侵入性和操作限制凸显了对非侵入性早期诊断方法的迫切需求。
在这项回顾性研究中,我们分析了解放军总医院第一医学中心2011年至2021年的门诊和住院记录,重点关注CT扫描中直径为5 - 30mm且无明显恶性征象的肺结节。病理检查作为参考标准。采用五折交叉验证的比较实验评估支持向量机(SVM)、随机森林(RF)、极端梯度提升(XGBoost)、全连接神经网络(FNN)和注意力全连接神经网络(Atten_FNN),以评估曲线下面积(AUC)、敏感性和特异性。数据集按70%/30%划分,并对训练集应用分层五折交叉验证。使用SHAP对最优模型进行解释,以识别最具影响力的预测特征。
本研究纳入3355例患者,其中良性肺结节1156例,恶性肺结节2199例。Atten_FNN模型在五折交叉验证中表现优异,AUC为0.82,准确率为0.75,敏感性为0.77,F1评分为0.80。SHAP分析揭示了关键预测因素:人口统计学变量(年龄、性别、体重指数);CT衍生特征(最大结节直径、形态、密度、钙化、磨玻璃影);以及实验室生物标志物(神经内分泌标志物、癌胚抗原)。
本研究整合电子病历和病理数据,使用机器学习/深度学习模型预测肺结节的恶性程度。基于SHAP的可解释性分析揭示了关键临床决定因素。认识到跨中心可推广性的局限性,我们建议开发一种结合CT成像和放射组学的多模态诊断系统,并在多中心前瞻性队列中进行验证,以促进临床转化。该框架为肺癌的早期精准诊断建立了一种新范式。