• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于监测、流行病学和最终结果(SEER)数据库分析:基于树的机器学习算法与Cox回归在预测口腔和咽癌生存率方面的比较

Comparison of the Tree-Based Machine Learning Algorithms to Cox Regression in Predicting the Survival of Oral and Pharyngeal Cancers: Analyses Based on SEER Database.

作者信息

Du Mi, Haag Dandara G, Lynch John W, Mittinty Murthy N

机构信息

School of Public Health, The University of Adelaide, 5005 Adelaide, Australia.

Robinson Research Institute, The University of Adelaide, 5005 Adelaide, Australia.

出版信息

Cancers (Basel). 2020 Sep 29;12(10):2802. doi: 10.3390/cancers12102802.

DOI:10.3390/cancers12102802
PMID:33003533
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7600270/
Abstract

This study aims to demonstrate the use of the tree-based machine learning algorithms to predict the 3- and 5-year disease-specific survival of oral and pharyngeal cancers (OPCs) and compare their performance with the traditional Cox regression. A total of 21,154 individuals diagnosed with OPCs between 2004 and 2009 were obtained from the Surveillance, Epidemiology, and End Results (SEER) database. Three tree-based machine learning algorithms (survival tree (ST), random forest (RF) and conditional inference forest (CF)), together with a reference technique (Cox proportional hazard models (Cox)), were used to develop the survival prediction models. To handle the missing values in predictors, we applied the substantive model compatible version of the fully conditional specification imputation approach to the Cox model, whereas we used RF to impute missing data for the ST, RF and CF models. For internal validation, we used 10-fold cross-validation with 50 iterations in the model development datasets. Following this, model performance was evaluated using the C-index, integrated Brier score (IBS) and calibration curves in the test datasets. For predicting the 3-year survival of OPCs with the complete cases, the C-index in the development sets were 0.77 (0.77, 0.77), 0.70 (0.70, 0.70), 0.83 (0.83, 0.84) and 0.83 (0.83, 0.86) for Cox, ST, RF and CF, respectively. Similar results were observed in the 5-year survival prediction models, with C-index for Cox, ST, RF and CF being 0.76 (0.76, 0.76), 0.69 (0.69, 0.70), 0.83 (0.83, 0.83) and 0.85 (0.84, 0.86), respectively, in development datasets. The prediction error curves based on IBS showed a similar pattern for these models. The predictive performance remained unchanged in the analyses with imputed data. Additionally, a free web-based calculator was developed for potential clinical use. In conclusion, compared to Cox regression, ST had a lower and RF and CF had a higher predictive accuracy in predicting the 3- and 5-year OPCs survival using SEER data. The RF and CF algorithms provide non-parametric alternatives to Cox regression to be of clinical use for estimating the survival probability of OPCs patients.

摘要

本研究旨在证明基于树的机器学习算法在预测口腔和咽癌(OPC)3年和5年疾病特异性生存率方面的应用,并将其性能与传统的Cox回归进行比较。从监测、流行病学和最终结果(SEER)数据库中获取了2004年至2009年间共21154例被诊断为OPC的个体。使用三种基于树的机器学习算法(生存树(ST)、随机森林(RF)和条件推断森林(CF))以及一种参考技术(Cox比例风险模型(Cox))来开发生存预测模型。为处理预测变量中的缺失值,我们将完全条件指定插补方法的实质性模型兼容版本应用于Cox模型,而对于ST、RF和CF模型,我们使用RF来插补缺失数据。对于内部验证,我们在模型开发数据集中使用了50次迭代的10折交叉验证。在此之后,在测试数据集中使用C指数、综合Brier评分(IBS)和校准曲线来评估模型性能。对于用完整病例预测OPC的3年生存率,在开发集中Cox、ST、RF和CF的C指数分别为(0.77(0.77,0.77))、(0.70(0.70,0.70))、(0.83(0.83,0.84))和(0.83(0.83,0.86))。在5年生存预测模型中观察到了类似的结果,在开发数据集中Cox、ST、RF和CF的C指数分别为(0.76(0.76,0.76))、(0.69(0.69,0.70))、(0.83(0.83,0.83))和(0.85(0.84, \ 0.86))。基于IBS的预测误差曲线显示这些模型具有相似的模式。在对插补数据的分析中,预测性能保持不变。此外,还开发了一个基于网络的免费计算器以供潜在的临床使用。总之,与Cox回归相比,使用SEER数据预测OPC的3年和5年生存率时,ST的预测准确性较低,而RF和CF的预测准确性较高。RF和CF算法为Cox回归提供了非参数替代方法,可用于临床估计OPC患者的生存概率。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b948/7600270/f0d5ce41fe39/cancers-12-02802-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b948/7600270/16de91f06cb8/cancers-12-02802-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b948/7600270/2b021b145e00/cancers-12-02802-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b948/7600270/6e97ecabccb0/cancers-12-02802-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b948/7600270/f0d5ce41fe39/cancers-12-02802-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b948/7600270/16de91f06cb8/cancers-12-02802-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b948/7600270/2b021b145e00/cancers-12-02802-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b948/7600270/6e97ecabccb0/cancers-12-02802-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b948/7600270/f0d5ce41fe39/cancers-12-02802-g004.jpg

相似文献

1
Comparison of the Tree-Based Machine Learning Algorithms to Cox Regression in Predicting the Survival of Oral and Pharyngeal Cancers: Analyses Based on SEER Database.基于监测、流行病学和最终结果(SEER)数据库分析:基于树的机器学习算法与Cox回归在预测口腔和咽癌生存率方面的比较
Cancers (Basel). 2020 Sep 29;12(10):2802. doi: 10.3390/cancers12102802.
2
Which model is better in predicting the survival of laryngeal squamous cell carcinoma?: Comparison of the random survival forest based on machine learning algorithms to Cox regression: analyses based on SEER database.哪种模型更能预测喉鳞状细胞癌的生存情况?:基于机器学习算法的随机生存森林与 Cox 回归的比较:基于 SEER 数据库的分析。
Medicine (Baltimore). 2023 Mar 10;102(10):e33144. doi: 10.1097/MD.0000000000033144.
3
The prediction of the survival in patients with severe trauma during prehospital care: Analyses based on NTDB database.严重创伤患者在院前急救期间的生存预测:基于 NTDB 数据库的分析。
Eur J Trauma Emerg Surg. 2024 Aug;50(4):1599-1609. doi: 10.1007/s00068-024-02484-0. Epub 2024 Mar 14.
4
Predicting Colorectal Cancer Survival Using Time-to-Event Machine Learning: Retrospective Cohort Study.基于生存事件的机器学习预测结直肠癌患者生存情况:回顾性队列研究。
J Med Internet Res. 2023 Oct 26;25:e44417. doi: 10.2196/44417.
5
The prognostic value of machine learning techniques versus cox regression model for head and neck cancer.机器学习技术与 Cox 回归模型对头颈癌的预后价值。
Methods. 2022 Sep;205:123-132. doi: 10.1016/j.ymeth.2022.07.001. Epub 2022 Jul 4.
6
Prediction of survival in oropharyngeal squamous cell carcinoma using machine learning algorithms: A study based on the surveillance, epidemiology, and end results database.使用机器学习算法预测口咽鳞状细胞癌的生存率:一项基于监测、流行病学和最终结果数据库的研究。
Front Oncol. 2022 Aug 22;12:974678. doi: 10.3389/fonc.2022.974678. eCollection 2022.
7
Deep learning models for predicting the survival of patients with chondrosarcoma based on a surveillance, epidemiology, and end results analysis.基于监测、流行病学和最终结果分析的预测软骨肉瘤患者生存率的深度学习模型。
Front Oncol. 2022 Aug 22;12:967758. doi: 10.3389/fonc.2022.967758. eCollection 2022.
8
Development and validation of survival prediction model for gastric adenocarcinoma patients using deep learning: A SEER-based study.基于深度学习的胃腺癌患者生存预测模型的开发与验证:一项基于监测、流行病学和最终结果(SEER)数据库的研究
Front Oncol. 2023 Mar 7;13:1131859. doi: 10.3389/fonc.2023.1131859. eCollection 2023.
9
Machine learning for predicting the survival in osteosarcoma patients: Analysis based on American and Hebei Province cohort.基于美国和河北省队列的骨肉瘤患者生存预测的机器学习分析。
Biomol Biomed. 2023 Sep 4;23(5):883-893. doi: 10.17305/bb.2023.8804.
10
Dementia risk prediction in individuals with mild cognitive impairment: a comparison of Cox regression and machine learning models.轻度认知障碍个体的痴呆风险预测:Cox 回归和机器学习模型的比较。
BMC Med Res Methodol. 2022 Nov 2;22(1):284. doi: 10.1186/s12874-022-01754-y.

引用本文的文献

1
Machine learning for the prediction of augmented renal clearance (ARC) in patients with sepsis in critical care units.用于预测重症监护病房脓毒症患者肾脏清除率增加(ARC)的机器学习
Sci Rep. 2025 Jul 18;15(1):26119. doi: 10.1038/s41598-025-11313-2.
2
Machine Learning Predictive Models for Survival in Patients with Brain Stroke.用于预测脑卒中患者生存情况的机器学习模型
Health Promot Perspect. 2025 May 6;15(1):63-72. doi: 10.34172/hpp.025.43635. eCollection 2025 May.
3
A prognostic model for highly aggressive prostate cancer using interpretable machine learning techniques.

本文引用的文献

1
Examining Bias and Reporting in Oral Health Prediction Modeling Studies.口腔健康预测建模研究中的偏倚与报告
J Dent Res. 2020 Apr;99(4):374-387. doi: 10.1177/0022034520903725. Epub 2020 Feb 6.
2
Incidence Trends of Lip, Oral Cavity, and Pharyngeal Cancers: Global Burden of Disease 1990-2017.唇癌、口腔癌和口咽癌发病率趋势:1990-2017 年全球疾病负担
J Dent Res. 2020 Feb;99(2):143-151. doi: 10.1177/0022034519894963. Epub 2019 Dec 24.
3
Deep learning-based survival prediction of oral cancer patients.基于深度学习的口腔癌患者生存预测。
一种使用可解释机器学习技术的高侵袭性前列腺癌预后模型。
Front Med (Lausanne). 2025 May 12;12:1512870. doi: 10.3389/fmed.2025.1512870. eCollection 2025.
4
Evaluation of Machine Learning and Traditional Statistical Models to Assess the Value of Stroke Genetic Liability for Prediction of Risk of Stroke Within the UK Biobank.评估机器学习和传统统计模型以评估中风遗传易感性在英国生物银行中对中风风险预测的价值。
Healthcare (Basel). 2025 Apr 26;13(9):1003. doi: 10.3390/healthcare13091003.
5
Predicting Superaverage Length of Stay in COPD Patients with Hypercapnic Respiratory Failure Using Machine Learning.使用机器学习预测慢性阻塞性肺疾病合并高碳酸血症呼吸衰竭患者的超长住院时间
J Inflamm Res. 2025 May 8;18:5993-6008. doi: 10.2147/JIR.S511092. eCollection 2025.
6
Machine learning model to predict sepsis in ICU patients with intracerebral hemorrhage.用于预测脑出血重症监护病房患者脓毒症的机器学习模型。
Sci Rep. 2025 May 10;15(1):16326. doi: 10.1038/s41598-025-99431-9.
7
Predicting the risk of acute kidney injury in patients with acute pancreatitis complicated by sepsis using a stacked ensemble machine learning model: a retrospective study based on the MIMIC database.使用堆叠集成机器学习模型预测急性胰腺炎合并脓毒症患者的急性肾损伤风险:一项基于MIMIC数据库的回顾性研究
BMJ Open. 2025 Feb 26;15(2):e087427. doi: 10.1136/bmjopen-2024-087427.
8
Developing clinical prognostic models to predict graft survival after renal transplantation: comparison of statistical and machine learning models.开发临床预后模型以预测肾移植后的移植物存活:统计模型与机器学习模型的比较
BMC Med Inform Decis Mak. 2025 Feb 3;25(1):54. doi: 10.1186/s12911-025-02906-y.
9
Enhanced Lung Cancer Survival Prediction Using Semi-Supervised Pseudo-Labeling and Learning from Diverse PET/CT Datasets.使用半监督伪标记和从多样的PET/CT数据集中学习来增强肺癌生存预测
Cancers (Basel). 2025 Jan 17;17(2):285. doi: 10.3390/cancers17020285.
10
Integrating machine learning with bioinformatics for predicting idiopathic pulmonary fibrosis prognosis: developing an individualized clinical prediction tool.将机器学习与生物信息学相结合以预测特发性肺纤维化的预后:开发一种个性化临床预测工具。
Exp Biol Med (Maywood). 2024 Dec 23;249:10215. doi: 10.3389/ebm.2024.10215. eCollection 2024.
Sci Rep. 2019 May 6;9(1):6994. doi: 10.1038/s41598-019-43372-7.
4
PROBAST: A Tool to Assess the Risk of Bias and Applicability of Prediction Model Studies.PROBAST:一种用于评估偏倚风险和预测模型研究适用性的工具。
Ann Intern Med. 2019 Jan 1;170(1):51-58. doi: 10.7326/M18-1376.
5
Global cancer statistics 2018: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries.全球癌症统计数据 2018:GLOBOCAN 对全球 185 个国家/地区 36 种癌症的发病率和死亡率的估计。
CA Cancer J Clin. 2018 Nov;68(6):394-424. doi: 10.3322/caac.21492. Epub 2018 Sep 12.
6
Nomograms and risk scores for predicting the risk of oral cancer in different sexes: a large-scale case-control study.预测不同性别口腔癌风险的列线图和风险评分:一项大规模病例对照研究
J Cancer. 2018 Jun 22;9(14):2543-2548. doi: 10.7150/jca.24431. eCollection 2018.
7
Nomograms forecasting long-term overall and cancer-specific survival of patients with oral squamous cell carcinoma.列线图预测口腔鳞状细胞癌患者的长期总体和癌症特异性生存。
Cancer Med. 2018 Apr;7(4):943-952. doi: 10.1002/cam4.1216. Epub 2018 Mar 7.
8
Cause-specific mortality in HPV+ and HPV- oropharyngeal cancer patients: insights from a population-based cohort.HPV 阳性和 HPV 阴性口咽癌患者的病因特异性死亡率:基于人群队列的研究结果。
Cancer Med. 2018 Jan;7(1):87-94. doi: 10.1002/cam4.1264. Epub 2017 Nov 24.
9
Development and Validation of Nomograms Predictive of Overall and Progression-Free Survival in Patients With Oropharyngeal Cancer.口咽癌患者总生存和无进展生存预测列线图的开发与验证
J Clin Oncol. 2017 Dec 20;35(36):4057-4065. doi: 10.1200/JCO.2016.72.0748. Epub 2017 Aug 4.
10
A comparison of the conditional inference survival forest model to random survival forests based on a simulation study as well as on two applications with time-to-event data.基于模拟研究以及对两个事件发生时间数据应用的情况,对条件推断生存森林模型与随机生存森林进行比较。
BMC Med Res Methodol. 2017 Jul 28;17(1):115. doi: 10.1186/s12874-017-0383-8.