Mogensen Ulla B, Ishwaran Hemant, Gerds Thomas A
Department of Biostatistics, University of Copenhagen, Denmark.
Department of Epidemiology and Public Health, University of Miami, USA.
J Stat Softw. 2012 Sep;50(11):1-23. doi: 10.18637/jss.v050.i11.
Prediction error curves are increasingly used to assess and compare predictions in survival analysis. This article surveys the R package which provides a set of functions for efficient computation of prediction error curves. The software implements inverse probability of censoring weights to deal with right censored data and several variants of cross-validation to deal with the apparent error problem. In principle, all kinds of prediction models can be assessed, and the package readily supports most traditional regression modeling strategies, like Cox regression or additive hazard regression, as well as state of the art machine learning methods such as random forests, a nonparametric method which provides promising alternatives to traditional strategies in low and high-dimensional settings. We show how the functionality of can be extended to yet unsupported prediction models. As an example, we implement support for random forest prediction models based on the R-packages and . Using data of the Copenhagen Stroke Study we use to compare random forests to a Cox regression model derived from stepwise variable selection. Reproducible results on the user level are given for publicly available data from the German breast cancer study group.
预测误差曲线在生存分析中越来越多地用于评估和比较预测结果。本文介绍了一个R包,它提供了一组用于高效计算预测误差曲线的函数。该软件实现了用于处理右删失数据的删失权重逆概率以及用于处理明显误差问题的几种交叉验证变体。原则上,可以评估各种预测模型,并且该包很容易支持大多数传统回归建模策略,如Cox回归或加法风险回归,以及诸如随机森林等先进的机器学习方法,随机森林是一种非参数方法,在低维和高维设置中为传统策略提供了有前景的替代方案。我们展示了如何将该包的功能扩展到尚未得到支持的预测模型。例如,我们基于R包和实现了对随机森林预测模型的支持。使用哥本哈根中风研究的数据,我们使用该包将随机森林与通过逐步变量选择得出的Cox回归模型进行比较。针对德国乳腺癌研究组的公开可用数据,在用户层面给出了可重现的结果。