Van Calster Ben, Steyerberg Ewout W, D'Agostino Ralph B, Pencina Michael J
KU Leuven Department of Development and Regeneration, Leuven, Belgium (BVC).
Department of Biostatistics, Boston University, Boston, MA (BVC, RBD, MJP)
Med Decis Making. 2014 May;34(4):513-22. doi: 10.1177/0272989X13513654. Epub 2013 Dec 30.
When comparing prediction models, it is essential to estimate the magnitude of change in performance rather than rely solely on statistical significance. In this paper we investigate measures that estimate change in classification performance, assuming 2-group classification based on a single risk threshold. We study the value of a new biomarker when added to a baseline risk prediction model. First, simulated data are used to investigate the change in sensitivity and specificity (ΔSe and ΔSp). Second, the influence of ΔSe and ΔSp on the net reclassification improvement (NRI; sum of ΔSe and ΔSp) and on decision-analytic measures (net benefit or relative utility) is studied. We assume normal distributions for the predictors and assume correctly specified models such that the extended model has a dominating receiver operating characteristic curve relative to the baseline model. Remarkably, we observe that even when a strong marker is added it is possible that either sensitivity (for thresholds below the event rate) or specificity (for thresholds above the event rate) decreases. In these cases, decision-analytic measures provide more modest support for improved classification than NRI, even though all measures confirm that adding the marker improved classification accuracy. Our results underscore the necessity of reporting ΔSe and ΔSp separately. When a single summary is desired, decision-analytic measures allow for a simple incorporation of the misclassification costs.
在比较预测模型时,评估性能变化的幅度至关重要,而不是仅仅依赖统计显著性。在本文中,我们研究了在基于单一风险阈值进行两组分类的假设下,估计分类性能变化的方法。我们研究了将一种新的生物标志物添加到基线风险预测模型时的价值。首先,使用模拟数据来研究灵敏度和特异度的变化(ΔSe和ΔSp)。其次,研究了ΔSe和ΔSp对净重新分类改善(NRI;ΔSe与ΔSp之和)以及决策分析指标(净效益或相对效用)的影响。我们假设预测变量呈正态分布,并假设模型设定正确,使得扩展模型相对于基线模型具有主导的接收者操作特征曲线。值得注意的是,我们观察到,即使添加了一个强标志物,在事件发生率以下的阈值时灵敏度或在事件发生率以上的阈值时特异度仍有可能降低。在这些情况下,尽管所有指标都证实添加标志物提高了分类准确性,但决策分析指标对改善分类的支持比NRI更为适度。我们的结果强调了分别报告ΔSe和ΔSp的必要性。当需要一个单一的汇总指标时,决策分析指标允许简单地纳入误分类成本。