Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Solna, Sweden.
J Biopharm Stat. 2022 Nov 2;32(6):858-870. doi: 10.1080/10543406.2022.2041655. Epub 2022 May 15.
There have been many strategies to adapt machine learning algorithms to account for right censored observations in survival data in order to build more accurate risk prediction models. These adaptions have included pre-processing steps such as pseudo-observation transformation of the survival outcome or inverse probability of censoring weighted (IPCW) bootstrapping of the observed binary indicator of an event prior to a time point of interest. These pre-processing steps allow existing or newly developed machine learning methods, which were not specifically developed with time-to-event data in mind, to be applied to right censored survival data for predicting the risk of experiencing an event. Stacking or ensemble methods can improve on risk predictions, but in general, the combination of pseudo-observation-based algorithms, IPCW bootstrapping, IPC weighting of the methods directly, and methods developed specifically for survival has not been considered in the same ensemble. In this paper, we propose an ensemble procedure based on the area under the pseudo-observation-based-time-dependent ROC curve to optimally stack predictions from any survival or survival adapted algorithm. The real application results show that our proposed method can improve on single survival based methods such as survival random forest or on other strategies that use a pre-processing step such as inverse probability of censoring weighted bagging or pseudo-observations alone.
已经有许多策略可以使机器学习算法适应生存数据中的右删失观测值,以构建更准确的风险预测模型。这些自适应方法包括预处理步骤,例如对生存结局进行伪观测转换,或者在感兴趣的时间点之前对事件的观测二元指示符进行逆概率 censoring 加权(Inverse Probability of Censoring Weighting,IPCW)引导。这些预处理步骤允许现有的或新开发的机器学习方法(这些方法不是专门为时间事件数据开发的)应用于右删失生存数据,以预测经历事件的风险。堆叠或集成方法可以提高风险预测,但一般来说,基于伪观测的算法、IPCW 引导、方法的直接 IPC 加权以及专门为生存开发的方法的组合尚未在同一集成中考虑。在本文中,我们提出了一种基于基于伪观测的时间相关 ROC 曲线下面积的集成程序,以最优地堆叠任何生存或生存适应算法的预测。实际应用结果表明,我们提出的方法可以改进基于生存的单一方法,如生存随机森林,或者改进其他使用预处理步骤(如逆概率 censoring 加权套袋或伪观测)的策略。