LeDell Erin, Petersen Maya, van der Laan Mark
Division of Biostatistics, University of California, Berkeley, Berkeley, CA 94720, USA.
Electron J Stat. 2015;9(1):1583-1607. doi: 10.1214/15-EJS1035.
In binary classification problems, the area under the ROC curve (AUC) is commonly used to evaluate the performance of a prediction model. Often, it is combined with cross-validation in order to assess how the results will generalize to an independent data set. In order to evaluate the quality of an estimate for cross-validated AUC, we obtain an estimate of its variance. For massive data sets, the process of generating a single performance estimate can be computationally expensive. Additionally, when using a complex prediction method, the process of cross-validating a predictive model on even a relatively small data set can still require a large amount of computation time. Thus, in many practical settings, the bootstrap is a computationally intractable approach to variance estimation. As an alternative to the bootstrap, we demonstrate a computationally efficient influence curve based approach to obtaining a variance estimate for cross-validated AUC.
在二元分类问题中,ROC曲线下面积(AUC)通常用于评估预测模型的性能。通常,它会与交叉验证相结合,以评估结果如何推广到独立数据集。为了评估交叉验证AUC估计值的质量,我们获得其方差的估计值。对于海量数据集,生成单个性能估计值的过程在计算上可能成本很高。此外,当使用复杂的预测方法时,即使在相对较小的数据集上对预测模型进行交叉验证的过程仍可能需要大量计算时间。因此,在许多实际情况下,自助法是一种计算上难以处理的方差估计方法。作为自助法的替代方法,我们展示了一种基于影响曲线的计算高效方法,用于获得交叉验证AUC的方差估计值。