Biostatistics and Biomathematics, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA.
Stat Med. 2013 Apr 30;32(9):1467-82. doi: 10.1002/sim.5727. Epub 2013 Jan 7.
Authors have proposed new methodology in recent years for evaluating the improvement in prediction performance gained by adding a new predictor, Y, to a risk model containing a set of baseline predictors, X, for a binary outcome D. We prove theoretically that null hypotheses concerning no improvement in performance are equivalent to the simple null hypothesis that Y is not a risk factor when controlling for X, H0 : P(D = 1 | X,Y ) = P(D = 1 | X). Therefore, testing for improvement in prediction performance is redundant if Y has already been shown to be a risk factor. We also investigate properties of tests through simulation studies, focusing on the change in the area under the ROC curve (AUC). An unexpected finding is that standard testing procedures that do not adjust for variability in estimated regression coefficients are extremely conservative. This may explain why the AUC is widely considered insensitive to improvements in prediction performance and suggests that the problem of insensitivity has to do with use of invalid procedures for inference rather than with the measure itself. To avoid redundant testing and use of potentially problematic methods for inference, we recommend that hypothesis testing for no improvement be limited to evaluation of Y as a risk factor, for which methods are well developed and widely available. Analyses of measures of prediction performance should focus on estimation rather than on testing for no improvement in performance.
近年来,作者提出了新的方法来评估通过向包含一组基线预测因子 X 的风险模型中添加新的预测因子 Y 来提高对二项结局 D 的预测性能。我们从理论上证明,关于没有性能提高的零假设等同于当控制 X 时 Y 不是风险因素的简单零假设,H0:P(D=1|X,Y)=P(D=1|X)。因此,如果已经证明 Y 是一个风险因素,那么对预测性能提高的检验就是多余的。我们还通过模拟研究调查了检验的性质,重点关注 ROC 曲线下面积 (AUC) 的变化。一个意外的发现是,不调整估计回归系数变异性的标准检验程序极其保守。这可能解释了为什么 AUC 被广泛认为对预测性能的提高不敏感,并表明不敏感的问题与用于推断的无效程序有关,而不是与该措施本身有关。为了避免冗余检验和使用可能存在问题的推断方法,我们建议将无改进检验的假设限制在 Y 作为风险因素的评估,因为已经开发并广泛提供了针对该因素的方法。预测性能的度量分析应侧重于估计,而不是对性能无改进的检验。