Golmakani Marzieh K, Polley Eric C
Pfizer Inc., San Diego, CA, USA.
Health Science Research, Mayo Clinic Minnesota, Rochester, Minnesota, USA.
Int J Biostat. 2020 Feb 22. doi: 10.1515/ijb-2019-0065.
Survival analysis is a widely used method to establish a connection between a time to event outcome and a set of potential covariates. Accurately predicting the time of an event of interest is of primary importance in survival analysis. Many different algorithms have been proposed for survival prediction. However, for a given prediction problem it is rarely, if ever, possible to know in advance which algorithm will perform the best. In this paper we propose two algorithms for constructing super learners in survival data prediction where the individual algorithms are based on proportional hazards. A super learner is a flexible approach to statistical learning that finds the best weighted ensemble of the individual algorithms. Finding the optimal combination of the individual algorithms through minimizing cross-validated risk controls for over-fitting of the final ensemble learner. Candidate algorithms may range from a basic Cox model to tree-based machine learning algorithms, assuming all candidate algorithms are based on the proportional hazards framework. The ensemble weights are estimated by minimizing the cross-validated negative log partial likelihood. We compare the performance of the proposed super learners with existing models through extensive simulation studies. In all simulation scenarios, the proposed super learners are either the best fit or near the best fit. The performances of the newly proposed algorithms are also demonstrated with clinical data examples.
生存分析是一种广泛使用的方法,用于建立事件发生时间结果与一组潜在协变量之间的联系。在生存分析中,准确预测感兴趣事件的发生时间至关重要。已经提出了许多不同的算法用于生存预测。然而,对于给定的预测问题,很少有可能(如果有的话)提前知道哪种算法将表现最佳。在本文中,我们提出了两种在生存数据预测中构建超级学习器的算法,其中各个算法基于比例风险模型。超级学习器是一种灵活的统计学习方法,它能找到各个算法的最佳加权组合。通过最小化交叉验证风险来控制最终集成学习器的过拟合,从而找到各个算法的最优组合。候选算法的范围可以从基本的Cox模型到基于树的机器学习算法,假设所有候选算法都基于比例风险框架。通过最小化交叉验证的负对数偏似然来估计集成权重。我们通过广泛的模拟研究比较了所提出的超级学习器与现有模型的性能。在所有模拟场景中,所提出的超级学习器要么是最佳拟合,要么接近最佳拟合。新提出算法的性能也通过临床数据示例得到了证明。