Schmid Matthias, Hothorn Torsten
1Institut für Medizininformatik, Biometrie und Epidemiologie, Friedrich-Alexander-Universität Erlangen-Nürnberg, Waldstrasse 6, D-91054 Erlangen, Germany.
BMC Bioinformatics. 2008 Jun 6;9:269. doi: 10.1186/1471-2105-9-269.
When boosting algorithms are used for building survival models from high-dimensional data, it is common to fit a Cox proportional hazards model or to use least squares techniques for fitting semiparametric accelerated failure time models. There are cases, however, where fitting a fully parametric accelerated failure time model is a good alternative to these methods, especially when the proportional hazards assumption is not justified. Boosting algorithms for the estimation of parametric accelerated failure time models have not been developed so far, since these models require the estimation of a model-specific scale parameter which traditional boosting algorithms are not able to deal with.
We introduce a new boosting algorithm for censored time-to-event data which is suitable for fitting parametric accelerated failure time models. Estimation of the predictor function is carried out simultaneously with the estimation of the scale parameter, so that the negative log likelihood of the survival distribution can be used as a loss function for the boosting algorithm. The estimation of the scale parameter does not affect the favorable properties of boosting with respect to variable selection.
The analysis of a high-dimensional set of microarray data demonstrates that the new algorithm is able to outperform boosting with the Cox partial likelihood when the proportional hazards assumption is questionable. In low-dimensional settings, i.e., when classical likelihood estimation of a parametric accelerated failure time model is possible, simulations show that the new boosting algorithm closely approximates the estimates obtained from the maximum likelihood method.
当使用提升算法从高维数据构建生存模型时,通常会拟合Cox比例风险模型或使用最小二乘法技术来拟合半参数加速失效时间模型。然而,在某些情况下,拟合完全参数化的加速失效时间模型是这些方法的一个很好的替代方案,特别是当比例风险假设不合理时。到目前为止,尚未开发用于估计参数加速失效时间模型的提升算法,因为这些模型需要估计特定于模型的尺度参数,而传统的提升算法无法处理该参数。
我们引入了一种新的用于删失事件发生时间数据的提升算法,该算法适用于拟合参数加速失效时间模型。预测函数的估计与尺度参数的估计同时进行,因此生存分布的负对数似然可以用作提升算法的损失函数。尺度参数的估计不会影响提升在变量选择方面的良好性质。
对一组高维微阵列数据的分析表明,当比例风险假设存在疑问时,新算法能够优于使用Cox偏似然的提升算法。在低维情况下,即当可以对参数加速失效时间模型进行经典似然估计时,模拟表明新的提升算法与从最大似然法获得的估计值非常接近。