Shimokawa Asanao, Kawasaki Yohei, Miyaoka Etsuo
Int J Biostat. 2015 May;11(1):175-88. doi: 10.1515/ijb-2014-0029.
We compare splitting methods for constructing survival trees that are used as a model of survival time based on covariates. A number of splitting criteria on the classification and regression tree (CART) have been proposed by various authors, and we compare nine criteria through simulations. Comparative studies have been restricted to criteria that suppose the survival model for each terminal node in the final tree as a non-parametric model. As the main results, the criteria using the exponential log-likelihood loss, log-rank test statistics, the deviance residual under the proportional hazard model, or square error of martingale residual are recommended when it appears that the data have constant hazard with the passage of time. On the other hand, when the data are thought to have decreasing hazard with passage of time, the criterion using the two-sample test statistic, or square error of deviance residual would be optimal. Moreover, when the data are thought to have increasing hazard with the passage of time, the criterion using the exponential log-likelihood loss, or impurity that combines observed times and the proportion of censored observations would be the best. We also present the results of an actual medical research to show the utility of survival trees.
我们比较了用于构建生存树的分裂方法,这些生存树基于协变量用作生存时间的模型。不同作者提出了许多关于分类与回归树(CART)的分裂准则,我们通过模拟比较了九种准则。比较研究仅限于假设最终树中每个终端节点的生存模型为非参数模型的准则。作为主要结果,当数据随时间推移具有恒定风险时,建议使用指数对数似然损失、对数秩检验统计量、比例风险模型下的偏差残差或鞅残差的平方误差的准则。另一方面,当认为数据随时间推移风险降低时,使用双样本检验统计量或偏差残差平方误差的准则将是最优的。此外,当认为数据随时间推移风险增加时,使用指数对数似然损失或结合观察时间和删失观察比例的不纯性的准则将是最佳的。我们还展示了一项实际医学研究的结果,以说明生存树的效用。