Suppr超能文献

基于监测、流行病学和最终结果(SEER)数据库分析:基于树的机器学习算法与Cox回归在预测口腔和咽癌生存率方面的比较

Comparison of the Tree-Based Machine Learning Algorithms to Cox Regression in Predicting the Survival of Oral and Pharyngeal Cancers: Analyses Based on SEER Database.

作者信息

Du Mi, Haag Dandara G, Lynch John W, Mittinty Murthy N

机构信息

School of Public Health, The University of Adelaide, 5005 Adelaide, Australia.

Robinson Research Institute, The University of Adelaide, 5005 Adelaide, Australia.

出版信息

Cancers (Basel). 2020 Sep 29;12(10):2802. doi: 10.3390/cancers12102802.

Abstract

This study aims to demonstrate the use of the tree-based machine learning algorithms to predict the 3- and 5-year disease-specific survival of oral and pharyngeal cancers (OPCs) and compare their performance with the traditional Cox regression. A total of 21,154 individuals diagnosed with OPCs between 2004 and 2009 were obtained from the Surveillance, Epidemiology, and End Results (SEER) database. Three tree-based machine learning algorithms (survival tree (ST), random forest (RF) and conditional inference forest (CF)), together with a reference technique (Cox proportional hazard models (Cox)), were used to develop the survival prediction models. To handle the missing values in predictors, we applied the substantive model compatible version of the fully conditional specification imputation approach to the Cox model, whereas we used RF to impute missing data for the ST, RF and CF models. For internal validation, we used 10-fold cross-validation with 50 iterations in the model development datasets. Following this, model performance was evaluated using the C-index, integrated Brier score (IBS) and calibration curves in the test datasets. For predicting the 3-year survival of OPCs with the complete cases, the C-index in the development sets were 0.77 (0.77, 0.77), 0.70 (0.70, 0.70), 0.83 (0.83, 0.84) and 0.83 (0.83, 0.86) for Cox, ST, RF and CF, respectively. Similar results were observed in the 5-year survival prediction models, with C-index for Cox, ST, RF and CF being 0.76 (0.76, 0.76), 0.69 (0.69, 0.70), 0.83 (0.83, 0.83) and 0.85 (0.84, 0.86), respectively, in development datasets. The prediction error curves based on IBS showed a similar pattern for these models. The predictive performance remained unchanged in the analyses with imputed data. Additionally, a free web-based calculator was developed for potential clinical use. In conclusion, compared to Cox regression, ST had a lower and RF and CF had a higher predictive accuracy in predicting the 3- and 5-year OPCs survival using SEER data. The RF and CF algorithms provide non-parametric alternatives to Cox regression to be of clinical use for estimating the survival probability of OPCs patients.

摘要

本研究旨在证明基于树的机器学习算法在预测口腔和咽癌(OPC)3年和5年疾病特异性生存率方面的应用,并将其性能与传统的Cox回归进行比较。从监测、流行病学和最终结果(SEER)数据库中获取了2004年至2009年间共21154例被诊断为OPC的个体。使用三种基于树的机器学习算法(生存树(ST)、随机森林(RF)和条件推断森林(CF))以及一种参考技术(Cox比例风险模型(Cox))来开发生存预测模型。为处理预测变量中的缺失值,我们将完全条件指定插补方法的实质性模型兼容版本应用于Cox模型,而对于ST、RF和CF模型,我们使用RF来插补缺失数据。对于内部验证,我们在模型开发数据集中使用了50次迭代的10折交叉验证。在此之后,在测试数据集中使用C指数、综合Brier评分(IBS)和校准曲线来评估模型性能。对于用完整病例预测OPC的3年生存率,在开发集中Cox、ST、RF和CF的C指数分别为(0.77(0.77,0.77))、(0.70(0.70,0.70))、(0.83(0.83,0.84))和(0.83(0.83,0.86))。在5年生存预测模型中观察到了类似的结果,在开发数据集中Cox、ST、RF和CF的C指数分别为(0.76(0.76,0.76))、(0.69(0.69,0.70))、(0.83(0.83,0.83))和(0.85(0.84, \ 0.86))。基于IBS的预测误差曲线显示这些模型具有相似的模式。在对插补数据的分析中,预测性能保持不变。此外,还开发了一个基于网络的免费计算器以供潜在的临床使用。总之,与Cox回归相比,使用SEER数据预测OPC的3年和5年生存率时,ST的预测准确性较低,而RF和CF的预测准确性较高。RF和CF算法为Cox回归提供了非参数替代方法,可用于临床估计OPC患者的生存概率。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b948/7600270/16de91f06cb8/cancers-12-02802-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验