通过监督机器学习分类技术预测肺癌患者的生存情况。

Prediction of lung cancer patient survival via supervised machine learning classification techniques.

机构信息

Department of Computer Engineering and Computer Science, University of Louisville, KY, USA.

Department of Electrical and Computer Engineering, University of Louisville, KY, USA.

出版信息

Int J Med Inform. 2017 Dec;108:1-8. doi: 10.1016/j.ijmedinf.2017.09.013. Epub 2017 Sep 25.

DOI:10.1016/j.ijmedinf.2017.09.013

PMID:29132615

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5726571/

Abstract

Outcomes for cancer patients have been previously estimated by applying various machine learning techniques to large datasets such as the Surveillance, Epidemiology, and End Results (SEER) program database. In particular for lung cancer, it is not well understood which types of techniques would yield more predictive information, and which data attributes should be used in order to determine this information. In this study, a number of supervised learning techniques is applied to the SEER database to classify lung cancer patients in terms of survival, including linear regression, Decision Trees, Gradient Boosting Machines (GBM), Support Vector Machines (SVM), and a custom ensemble. Key data attributes in applying these methods include tumor grade, tumor size, gender, age, stage, and number of primaries, with the goal to enable comparison of predictive power between the various methods The prediction is treated like a continuous target, rather than a classification into categories, as a first step towards improving survival prediction. The results show that the predicted values agree with actual values for low to moderate survival times, which constitute the majority of the data. The best performing technique was the custom ensemble with a Root Mean Square Error (RMSE) value of 15.05. The most influential model within the custom ensemble was GBM, while Decision Trees may be inapplicable as it had too few discrete outputs. The results further show that among the five individual models generated, the most accurate was GBM with an RMSE value of 15.32. Although SVM underperformed with an RMSE value of 15.82, statistical analysis singles the SVM as the only model that generated a distinctive output. The results of the models are consistent with a classical Cox proportional hazards model used as a reference technique. We conclude that application of these supervised learning techniques to lung cancer data in the SEER database may be of use to estimate patient survival time with the ultimate goal to inform patient care decisions, and that the performance of these techniques with this particular dataset may be on par with that of classical methods.

摘要

先前，人们已经通过将各种机器学习技术应用于大型数据集（如监测、流行病学和最终结果 (SEER) 计划数据库）来估计癌症患者的预后。特别是对于肺癌，人们还不太清楚哪种类型的技术可以提供更多的预测信息，以及为了确定这些信息应该使用哪些数据属性。在这项研究中，将多种监督学习技术应用于 SEER 数据库，根据生存情况对肺癌患者进行分类，包括线性回归、决策树、梯度提升机 (GBM)、支持向量机 (SVM) 和自定义集成。应用这些方法的关键数据属性包括肿瘤等级、肿瘤大小、性别、年龄、阶段和原发灶数量，目的是比较各种方法的预测能力。该预测被视为连续目标，而不是分类为类别，作为提高生存预测的第一步。结果表明，预测值与低至中度生存时间的实际值相符，这构成了数据的大部分。表现最好的技术是自定义集成，其均方根误差 (RMSE) 值为 15.05。在自定义集成中，最有影响力的模型是 GBM，而决策树可能不适用，因为它的离散输出太少。结果进一步表明，在生成的五个单独模型中，最准确的是 GBM，其 RMSE 值为 15.32。尽管 SVM 的 RMSE 值为 15.82，但表现不佳，但统计分析表明，SVM 是唯一生成独特输出的模型。模型的结果与用作参考技术的经典 Cox 比例风险模型一致。我们得出结论，将这些监督学习技术应用于 SEER 数据库中的肺癌数据，可能有助于估计患者的生存时间，最终目标是为患者护理决策提供信息，并且这些技术在特定数据集上的性能可能与经典方法相当。

相似文献

Prediction of lung cancer patient survival via supervised machine learning classification techniques.通过监督机器学习分类技术预测肺癌患者的生存情况。

Int J Med Inform. 2017 Dec;108:1-8. doi: 10.1016/j.ijmedinf.2017.09.013. Epub 2017 Sep 25.

Lung Cancer Survival Prediction via Machine Learning Regression, Classification, and Statistical Techniques.通过机器学习回归、分类和统计技术进行肺癌生存预测。

Proc IEEE Int Symp Signal Proc Inf Tech. 2018 Dec;2018:632-637. doi: 10.1109/ISSPIT.2018.8642753. Epub 2019 Feb 18.

Application of unsupervised analysis techniques to lung cancer patient data.将无监督分析技术应用于肺癌患者数据。

PLoS One. 2017 Sep 14;12(9):e0184370. doi: 10.1371/journal.pone.0184370. eCollection 2017.

Machine learning models in breast cancer survival prediction.用于乳腺癌生存预测的机器学习模型。

Technol Health Care. 2016;24(1):31-42. doi: 10.3233/THC-151071.

Application of machine learning techniques for predicting survival in ovarian cancer.机器学习技术在卵巢癌生存预测中的应用。

BMC Med Inform Decis Mak. 2022 Dec 30;22(1):345. doi: 10.1186/s12911-022-02087-y.

Accuracy Enhanced Lung Cancer Prognosis for Improving Patient Survivability Using Proposed Gaussian Classifier System.利用提出的高斯分类器系统提高肺癌预后准确性，改善患者生存率。

J Med Syst. 2019 May 24;43(7):201. doi: 10.1007/s10916-019-1297-2.

Classifying injury narratives of large administrative databases for surveillance-A practical approach combining machine learning ensembles and human review.用于监测的大型行政数据库损伤叙述分类——一种结合机器学习集成和人工审核的实用方法。

Accid Anal Prev. 2017 Jan;98:359-371. doi: 10.1016/j.aap.2016.10.014. Epub 2016 Nov 15.

Comparison of Supervised Machine Learning Algorithms for Classifying of Home Discharge Possibility in Convalescent Stroke Patients: A Secondary Analysis.基于机器学习的监督算法在恢复期脑卒中患者居家康复可能性分类中的比较：二次分析。

J Stroke Cerebrovasc Dis. 2021 Oct;30(10):106011. doi: 10.1016/j.jstrokecerebrovasdis.2021.106011. Epub 2021 Jul 26.

Lung cancer survival period prediction and understanding: Deep learning approaches.肺癌生存期预测与认识：深度学习方法。

Int J Med Inform. 2021 Apr;148:104371. doi: 10.1016/j.ijmedinf.2020.104371. Epub 2020 Dec 29.

Machine learning applications in cancer prognosis and prediction.机器学习在癌症预后和预测中的应用。

Comput Struct Biotechnol J. 2014 Nov 15;13:8-17. doi: 10.1016/j.csbj.2014.11.005. eCollection 2015.

引用本文的文献

Predicting survival outcomes in advanced pancreatic cancer using machine learning methods.使用机器学习方法预测晚期胰腺癌的生存结果。

Medicine (Baltimore). 2025 Aug 15;104(33):e43904. doi: 10.1097/MD.0000000000043904.

Prognostic models for large cell neuroendocrine lung carcinoma: a machine learning and regression approach.大细胞神经内分泌肺癌的预后模型：一种机器学习与回归方法

Transl Lung Cancer Res. 2025 Jul 31;14(7):2470-2482. doi: 10.21037/tlcr-2025-130. Epub 2025 Jul 28.

Development and validation of machine learning model to predict early death of melanoma brain metastasis patients.预测黑色素瘤脑转移患者早期死亡的机器学习模型的开发与验证

Front Oncol. 2025 Jul 8;15:1517961. doi: 10.3389/fonc.2025.1517961. eCollection 2025.

Cervical cancer prediction using machine learning models based on routine blood analysis.基于常规血液分析的机器学习模型用于宫颈癌预测。

Sci Rep. 2025 Jul 2;15(1):22655. doi: 10.1038/s41598-025-08166-0.

Opportunities and challenges in lung cancer care in the era of large language models and vision language models.大语言模型和视觉语言模型时代肺癌护理中的机遇与挑战

Transl Lung Cancer Res. 2025 May 30;14(5):1830-1847. doi: 10.21037/tlcr-24-801. Epub 2025 May 23.

Advancements and future trends in machine learning for lung cancer: a comprehensive bibliometric analysis.肺癌机器学习的进展与未来趋势：一项全面的文献计量分析

Clin Transl Oncol. 2025 Jun 4. doi: 10.1007/s12094-025-03945-7.

Evaluation of Machine Learning and Traditional Statistical Models to Assess the Value of Stroke Genetic Liability for Prediction of Risk of Stroke Within the UK Biobank.评估机器学习和传统统计模型以评估中风遗传易感性在英国生物银行中对中风风险预测的价值。

Healthcare (Basel). 2025 Apr 26;13(9):1003. doi: 10.3390/healthcare13091003.

Artificial Intelligence in Thoracic Surgery: A Review Bridging Innovation and Clinical Practice for the Next Generation of Surgical Care.胸外科中的人工智能：一篇将创新与下一代外科护理临床实践相联系的综述

J Clin Med. 2025 Apr 16;14(8):2729. doi: 10.3390/jcm14082729.

Explainable machine learning for predicting lung metastasis of colorectal cancer.用于预测结直肠癌肺转移的可解释机器学习

Sci Rep. 2025 Apr 19;15(1):13611. doi: 10.1038/s41598-025-98188-5.

A Holistic Approach to Implementing Artificial Intelligence in Lung Cancer.肺癌中实施人工智能的整体方法。

Indian J Surg Oncol. 2025 Feb;16(1):257-278. doi: 10.1007/s13193-024-02079-6. Epub 2024 Sep 5.

本文引用的文献

Application of unsupervised analysis techniques to lung cancer patient data.将无监督分析技术应用于肺癌患者数据。

PLoS One. 2017 Sep 14;12(9):e0184370. doi: 10.1371/journal.pone.0184370. eCollection 2017.

How accurate are physicians in the prediction of patient survival in advanced lung cancer?医生对晚期肺癌患者生存预测的准确性如何？

Oncologist. 2010;15(7):782-9. doi: 10.1634/theoncologist.2009-0149. Epub 2010 Jun 17.

Analysis of second primary lung cancers in the SEER database.SEER 数据库中第二原发肺癌的分析。

J Surg Res. 2010 Jul;162(1):1-6. doi: 10.1016/j.jss.2009.12.030. Epub 2010 Jan 25.

Developing prognostic systems of cancer patients by ensemble clustering.通过集成聚类开发癌症患者的预后系统。

J Biomed Biotechnol. 2009;2009:632786. doi: 10.1155/2009/632786. Epub 2009 Jun 23.

Conditional Survival in Rectal Cancer: A SEER Database Analysis.直肠癌的条件生存：一项监测、流行病学和最终结果（SEER）数据库分析

Gastrointest Cancer Res. 2007 May;1(3):84-9.

Lung cancer occurrence in never-smokers: an analysis of 13 cohorts and 22 cancer registry studies.从不吸烟者肺癌发病情况：13 项队列研究和 22 项癌症登记研究的分析。

PLoS Med. 2008 Sep 30;5(9):e185. doi: 10.1371/journal.pmed.0050185. Epub 2008 Sep 9.

Lung cancer in elderly patients: an analysis of the surveillance, epidemiology, and end results database.老年患者的肺癌：监测、流行病学和最终结果数据库分析

J Clin Oncol. 2007 Dec 10;25(35):5570-7. doi: 10.1200/JCO.2007.12.5435.

Cancer statistics, trends, and multiple primary cancer analyses from the Surveillance, Epidemiology, and End Results (SEER) Program.来自监测、流行病学和最终结果（SEER）计划的癌症统计数据、趋势及多原发癌分析。

Oncologist. 2007 Jan;12(1):20-37. doi: 10.1634/theoncologist.12-1-20.

Predicting breast cancer survivability: a comparison of three data mining methods.预测乳腺癌的生存能力：三种数据挖掘方法的比较

Artif Intell Med. 2005 Jun;34(2):113-27. doi: 10.1016/j.artmed.2004.07.002.

Lung cancer in women: analysis of the national Surveillance, Epidemiology, and End Results database.女性肺癌：对国家监测、流行病学和最终结果数据库的分析

Chest. 2005 Mar;127(3):768-77. doi: 10.1378/chest.127.3.768.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。