Liu Mingyang, Li Hongzhe
Department of Biostatistics, Epidemiology, and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, United States.
Front Genet. 2021 Jan 7;11:587378. doi: 10.3389/fgene.2020.587378. eCollection 2020.
Estimation and prediction of heterogeneous restricted mean survival time (hRMST) is of great clinical importance, which can provide an easily interpretable and clinically meaningful summary of the survival function in the presence of censoring and individual covariates. The existing methods for the modeling of hRMST rely on proportional hazards or other parametric assumptions on the survival distribution. In this paper, we propose a random forest based estimation of hRMST for right-censored survival data with covariates and prove a central limit theorem for the resulting estimator. In addition, we present a computationally efficient construction for the confidence interval of hRMST. Our simulations show that the resulting confidence intervals have the correct coverage probability of the hRMST, and the random forest based estimate of hRMST has smaller prediction errors than the parametric models when the models are mis-specified. We apply the method to the ovarian cancer data set from The Cancer Genome Atlas (TCGA) project to predict hRMST and show an improved prediction performance over the existing methods. A software implementation, srf using R and C++, is available at https://github.com/lmy1019/SRF.
异质性受限平均生存时间(hRMST)的估计和预测具有重要的临床意义,它可以在存在删失和个体协变量的情况下,提供一个易于解释且具有临床意义的生存函数总结。现有的hRMST建模方法依赖于生存分布的比例风险或其他参数假设。在本文中,我们针对具有协变量的右删失生存数据,提出了一种基于随机森林的hRMST估计方法,并证明了所得估计量的中心极限定理。此外,我们给出了一种计算效率高的hRMST置信区间构造方法。我们的模拟结果表明,所得的置信区间具有正确的hRMST覆盖概率,并且当模型设定错误时,基于随机森林的hRMST估计比参数模型具有更小的预测误差。我们将该方法应用于癌症基因组图谱(TCGA)项目的卵巢癌数据集以预测hRMST,并显示出比现有方法更好的预测性能。一个使用R和C++的软件实现srf可在https://github.com/lmy1019/SRF获取。