Technology Division, SERC Rutherford and Appleton Laboratories, Chilton, England.
IEEE Trans Pattern Anal Mach Intell. 1982 Feb;4(2):215-20. doi: 10.1109/tpami.1982.4767229.
The problem of estimating the error probability of a given classification system is considered. Statistical properties of the empirical error count (C) and the average conditional error (R) estimators are studied. It is shown that in the large sample case the R estimator is unbiased and its variance is less than that of the C estimator. In contrast to conventional methods of Bayes error estimation the unbiasedness of the R estimator for a given classifier can be obtained only at the price of an additional set of classified samples. On small test sets the R estimator may be subject to a pessimistic bias caused by the averaging phenomenon characterizing the functioning of conditional error estimators.
本文研究了给定分类系统的误差概率估计问题。研究了经验误差计数(C)和平均条件误差(R)估计器的统计性质。结果表明,在大样本情况下,R 估计器是无偏的,其方差小于 C 估计器的方差。与传统的贝叶斯误差估计方法不同,给定分类器的 R 估计器的无偏性只能以额外的一组分类样本为代价获得。在小测试集上,由于条件误差估计器的特征是平均现象,因此 R 估计器可能会受到悲观偏差的影响。