Yousef Waleed A, Wagner Robert F, Loew Murray H
Food and Drug Administration, Center for Devices and Radiological Health, Rockville, MD 20852, USA.
IEEE Trans Pattern Anal Mach Intell. 2006 Nov;28(11):1809-17. doi: 10.1109/TPAMI.2006.218.
This paper considers binary classification. We assess a classifier in terms of the Area Under the ROC Curve (AUC). We estimate three important parameters, the conditional AUC (conditional on a particular training set) and the mean and variance of this AUC. We derive, as well, a closed form expression of the variance of the estimator of the AUC. This expression exhibits several components of variance that facilitate an understanding for the sources of uncertainty of that estimate. In addition, we estimate this variance, i.e., the variance of the conditional AUC estimator. Our approach is nonparametric and based on general methods from U-statistics; it addresses the case where the data distribution is neither known nor modeled and where there are only two available data sets, the training and testing sets. Finally, we illustrate some simulation results for these estimators.
本文考虑二元分类。我们根据ROC曲线下面积(AUC)来评估一个分类器。我们估计三个重要参数,即条件AUC(基于特定训练集的条件下)以及该AUC的均值和方差。我们还推导了AUC估计量方差的闭式表达式。该表达式展示了几个方差分量,有助于理解该估计的不确定性来源。此外,我们估计这个方差,即条件AUC估计量的方差。我们的方法是非参数的,基于U统计量的一般方法;它处理数据分布既未知也未建模且仅有两个可用数据集(训练集和测试集)的情况。最后,我们展示了这些估计量的一些模拟结果。