School of Computational Science and Engineering, Hamilton, McMaster University, 1280 Main St W, Hamilton, ON, L8S 4L8, Canada.
Department of Oncology, Hamilton, McMaster University, 1280 Main St W, Hamilton, ON, L8S 4L8, Canada.
Sci Rep. 2023 Jan 25;13(1):1370. doi: 10.1038/s41598-023-28393-7.
The Cox proportional hazards model is commonly used in evaluating risk factors in cancer survival data. The model assumes an additive, linear relationship between the risk factors and the log hazard. However, this assumption may be too simplistic. Further, failure to take time-varying covariates into account, if present, may lower prediction accuracy. In this retrospective, population-based, prognostic study of data from patients diagnosed with cancer from 2008 to 2015 in Ontario, Canada, we applied machine learning-based time-to-event prediction methods and compared their predictive performance in two sets of analyses: (1) yearly-cohort-based time-invariant and (2) fully time-varying covariates analysis. Machine learning-based methods-gradient boosting model (gbm), random survival forest (rsf), elastic net (enet), lasso and ridge-were compared to the traditional Cox proportional hazards (coxph) model and the prior study which used the yearly-cohort-based time-invariant analysis. Using Harrell's C index as our primary measure, we found that using both machine learning techniques and incorporating time-dependent covariates can improve predictive performance. Gradient boosting machine showed the best performance on test data in both time-invariant and time-varying covariates analysis.
Cox 比例风险模型常用于评估癌症生存数据中的风险因素。该模型假设风险因素与对数风险之间存在加性、线性关系。然而,这种假设可能过于简单化。此外,如果存在时变协变量而未予考虑,则可能会降低预测准确性。在这项对加拿大安大略省 2008 年至 2015 年间诊断为癌症的患者数据进行的回顾性、基于人群的预后研究中,我们应用了基于机器学习的时变事件预测方法,并在两组分析中比较了它们的预测性能:(1)基于年度队列的时不变分析和(2)完全时变协变量分析。与传统的 Cox 比例风险(coxph)模型和先前使用基于年度队列的时不变分析的研究相比,我们比较了基于机器学习的方法——梯度提升模型(gbm)、随机生存森林(rsf)、弹性网络(enet)、lasso 和岭回归。使用 Harrell 的 C 指数作为我们的主要衡量标准,我们发现使用机器学习技术并纳入时变协变量可以提高预测性能。在时不变和时变协变量分析中,梯度提升机在测试数据上的表现均最佳。