使用机器学习改善肺癌风险预测：堆叠模型与传统方法的比较分析

Improving Lung Cancer Risk Prediction Using Machine Learning: A Comparative Analysis of Stacking Models and Traditional Approaches.

作者信息

Tu Huakang, Zhao Yunfeng, Cui Jiameng, Lu Wanzhu, Sun Gege, Xu Xiaohang, Hu Qingfeng, Hu Kejia, Wu Ming, Wu Xifeng

机构信息

Center of Clinical Big Data and Analytics of the Second Affiliated Hospital and School of Public Health, Zhejiang University School of Medicine, Hangzhou 310058, China.

Department of Thoracic Surgery, The Second Affiliated Hospital, Zhejiang University School of Medicine, 88 Jiefang Rd., Hangzhou 310009, China.

出版信息

Cancers (Basel). 2025 May 13;17(10):1651. doi: 10.3390/cancers17101651.

DOI:10.3390/cancers17101651

PMID:40427148

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12109916/

Abstract

BACKGROUND

Lung cancer is a leading cause of cancer-related mortality worldwide, often diagnosed in advanced stages, making early detection critical. This study aimed to evaluate the performance of various machine learning models in predicting lung cancer risk based on epidemiological questionnaires, comparing them with traditional logistic regression models.

METHODS

A retrospective case-control study was conducted using data from 5421 lung cancer cases and 10,831 matched controls. The dataset included a wide range of demographic, clinical, and behavioral risk factors from epidemiological questionnaires. We developed and compared multiple machine learning algorithms, including LightGBM and stacking ensemble models, alongside logistic regression for predicting lung cancer risk. Model performance was evaluated using accuracy, area under the curve (AUC), and recall.

RESULTS

The stacking model outperformed traditional logistic regression, achieving an AUC of 0.887 (0.870-0.903) compared to 0.858 (0.839-0.878) for logistic regression. LightGBM also performed well, with an AUC of 0.884 (0.867-0.901). The stacking model achieved an accuracy of 81.2%, with a recall of 0.755, higher than the logistic regression model's accuracy of 79.4%. Compared to classical lung cancer prediction models (LLP and PLCO), the logistic regression and ML models improved AUC by 12% to 27%.

CONCLUSIONS

Integrating machine learning models into lung cancer screening programs can significantly enhance early detection efforts. Machine learning approaches, such as LightGBM and stacking, offer improved accuracy and predictive power over traditional models. However, efforts to enhance model interpretability through explainable AI techniques are necessary for broader clinical adoption.

摘要

背景

肺癌是全球癌症相关死亡的主要原因，通常在晚期才被诊断出来，因此早期检测至关重要。本研究旨在评估各种机器学习模型基于流行病学调查问卷预测肺癌风险的性能，并将其与传统逻辑回归模型进行比较。

方法

采用回顾性病例对照研究，使用了5421例肺癌病例和10831例匹配对照的数据。数据集包括来自流行病学调查问卷的广泛的人口统计学、临床和行为风险因素。我们开发并比较了多种机器学习算法，包括LightGBM和堆叠集成模型，以及用于预测肺癌风险的逻辑回归。使用准确率、曲线下面积（AUC）和召回率评估模型性能。

结果

堆叠模型优于传统逻辑回归，AUC为0.887（0.870 - 0.903），而逻辑回归的AUC为0.858（0.839 - 0.878）。LightGBM也表现良好，AUC为0.884（0.867 - 0.901）。堆叠模型的准确率为81.2%，召回率为0.755，高于逻辑回归模型79.4%的准确率。与经典肺癌预测模型（LLP和PLCO）相比，逻辑回归和机器学习模型将AUC提高了12%至27%。

结论

将机器学习模型整合到肺癌筛查项目中可以显著加强早期检测工作。诸如LightGBM和堆叠等机器学习方法比传统模型具有更高的准确率和预测能力。然而，通过可解释人工智能技术提高模型可解释性的努力对于更广泛地应用于临床是必要的。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2b4f/12109916/4d2ff1fa17c4/cancers-17-01651-g001.jpg

相似文献

Improving Lung Cancer Risk Prediction Using Machine Learning: A Comparative Analysis of Stacking Models and Traditional Approaches.使用机器学习改善肺癌风险预测：堆叠模型与传统方法的比较分析

Cancers (Basel). 2025 May 13;17(10):1651. doi: 10.3390/cancers17101651.

Supervised Machine Learning Models for Predicting Sepsis-Associated Liver Injury in Patients With Sepsis: Development and Validation Study Based on a Multicenter Cohort Study.用于预测脓毒症患者脓毒症相关肝损伤的监督式机器学习模型：基于多中心队列研究的开发与验证研究

J Med Internet Res. 2025 May 26;27:e66733. doi: 10.2196/66733.

Interpretable lung cancer risk prediction using ensemble learning and XAI based on lifestyle and demographic data.基于生活方式和人口统计学数据，使用集成学习和可解释人工智能进行可解释的肺癌风险预测。

Comput Biol Chem. 2025 Aug;117:108438. doi: 10.1016/j.compbiolchem.2025.108438. Epub 2025 Mar 27.

Enhancing predictive accuracy for urinary tract infections post-pediatric pyeloplasty with explainable AI: an ensemble TabNet approach.使用可解释人工智能提高小儿肾盂成形术后尿路感染的预测准确性：一种集成TabNet方法。

Sci Rep. 2025 Jan 19;15(1):2455. doi: 10.1038/s41598-024-82282-1.

A Risk Prediction Model for Physical Restraints Among Older Chinese Adults in Long-term Care Facilities: Machine Learning Study.长期护理机构中老年人身体约束的风险预测模型：机器学习研究。

J Med Internet Res. 2023 Apr 6;25:e43815. doi: 10.2196/43815.

Enhanced and Interpretable Prediction of Multiple Cancer Types Using a Stacking Ensemble Approach with SHAP Analysis.使用带有SHAP分析的堆叠集成方法对多种癌症类型进行增强且可解释的预测。

Bioengineering (Basel). 2025 Apr 29;12(5):472. doi: 10.3390/bioengineering12050472.

Using machine learning to develop a stacking ensemble learning model for the CT radiomics classification of brain metastases.利用机器学习开发用于脑转移瘤 CT 放射组学分类的堆叠集成学习模型。

Sci Rep. 2024 Nov 19;14(1):28575. doi: 10.1038/s41598-024-80210-x.

Comparing machine learning models for predicting preoperative DVT incidence in elderly hypertensive patients with hip fractures: a retrospective analysis.比较用于预测老年高血压髋部骨折患者术前深静脉血栓形成发生率的机器学习模型：一项回顾性分析。

Sci Rep. 2025 Apr 16;15(1):13206. doi: 10.1038/s41598-025-97880-w.

Multimodal MRI radiomics-based stacking ensemble learning model with automatic segmentation for prognostic prediction of HIFU ablation of uterine fibroids: a multicenter study.基于多模态MRI影像组学的自动分割堆叠集成学习模型用于子宫肌瘤高强度聚焦超声消融预后预测的多中心研究

Front Physiol. 2024 Dec 20;15:1507986. doi: 10.3389/fphys.2024.1507986. eCollection 2024.

Predicting hospitalization following psychiatric crisis care using machine learning.运用机器学习预测精神科危机护理后的住院情况。

BMC Med Inform Decis Mak. 2020 Dec 10;20(1):332. doi: 10.1186/s12911-020-01361-1.

本文引用的文献

Evaluation of risk prediction models to select lung cancer screening participants in Europe: a prospective cohort consortium analysis.评估风险预测模型以选择欧洲的肺癌筛查参与者：一项前瞻性队列联盟分析。

Lancet Digit Health. 2024 Sep;6(9):e614-e624. doi: 10.1016/S2589-7500(24)00123-7.

Machine Learning Models for Pancreatic Cancer Risk Prediction Using Electronic Health Record Data-A Systematic Review and Assessment.基于电子健康记录数据的胰腺癌风险预测机器学习模型：系统评价与评估。

Am J Gastroenterol. 2024 Aug 1;119(8):1466-1482. doi: 10.14309/ajg.0000000000002870. Epub 2024 May 16.

Towards full-stack deep learning-empowered data processing pipeline for synchrotron tomography experiments.迈向用于同步辐射断层扫描实验的全栈深度学习赋能的数据处理管道。

Innovation (Camb). 2023 Nov 16;5(1):100539. doi: 10.1016/j.xinn.2023.100539. eCollection 2024 Jan 8.

Multifactor artificial intelligence model assists axillary lymph node surgery in breast cancer after neoadjuvant chemotherapy: multicenter retrospective cohort study.多因素人工智能模型辅助新辅助化疗后乳腺癌腋窝淋巴结手术：多中心回顾性队列研究。

Int J Surg. 2023 Nov 1;109(11):3383-3394. doi: 10.1097/JS9.0000000000000621.

Artificial Intelligence in Lung Cancer Screening: The Future Is Now.人工智能在肺癌筛查中的应用：未来已来。

Cancers (Basel). 2023 Aug 30;15(17):4344. doi: 10.3390/cancers15174344.

Applying machine learning techniques to predict the risk of lung metastases from rectal cancer: a real-world retrospective study.应用机器学习技术预测直肠癌肺转移风险：一项真实世界回顾性研究。

Front Oncol. 2023 May 24;13:1183072. doi: 10.3389/fonc.2023.1183072. eCollection 2023.

Risk prediction of heart failure in patients with ischemic heart disease using network analytics and stacking ensemble learning.利用网络分析和堆叠集成学习预测缺血性心脏病患者心力衰竭的风险。

BMC Med Inform Decis Mak. 2023 May 23;23(1):99. doi: 10.1186/s12911-023-02196-2.

Predicting the future risk of lung cancer: development, and internal and external validation of the CanPredict (lung) model in 19·67 million people and evaluation of model performance against seven other risk prediction models.预测肺癌未来风险：CanPredict（肺部）模型在 1967 万人中的开发、内部和外部验证以及该模型与其他七个风险预测模型的性能评估。

Lancet Respir Med. 2023 Aug;11(8):685-697. doi: 10.1016/S2213-2600(23)00050-4. Epub 2023 Apr 5.

A comparison of machine learning models and Cox proportional hazards models regarding their ability to predict the risk of gastrointestinal cancer based on metabolic syndrome and its components.基于代谢综合征及其组成部分，对机器学习模型和Cox比例风险模型预测胃肠道癌症风险的能力进行比较。

Front Oncol. 2023 Mar 2;13:1049787. doi: 10.3389/fonc.2023.1049787. eCollection 2023.

OWL: an optimized and independently validated machine learning prediction model for lung cancer screening based on the UK Biobank, PLCO, and NLST populations.OWL：一种基于英国生物银行、PLCO 和 NLST 人群的肺癌筛查的优化和独立验证的机器学习预测模型。

EBioMedicine. 2023 Feb;88:104443. doi: 10.1016/j.ebiom.2023.104443. Epub 2023 Jan 24.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

使用机器学习改善肺癌风险预测：堆叠模型与传统方法的比较分析

Improving Lung Cancer Risk Prediction Using Machine Learning: A Comparative Analysis of Stacking Models and Traditional Approaches.

作者信息

机构信息

出版信息

BACKGROUND

METHODS

RESULTS

CONCLUSIONS

背景

方法

结果

结论

相似文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

本文引用的文献