Tötsch Niklas, Hoffmann Daniel
Faculty of Biology, University of Duisburg-Essen, Essen, Germany.
PeerJ Comput Sci. 2021 Mar 4;7:e398. doi: 10.7717/peerj-cs.398. eCollection 2021.
Classifiers are often tested on relatively small data sets, which should lead to uncertain performance metrics. Nevertheless, these metrics are usually taken at face value. We present an approach to quantify the uncertainty of classification performance metrics, based on a probability model of the confusion matrix. Application of our approach to classifiers from the scientific literature and a classification competition shows that uncertainties can be surprisingly large and limit performance evaluation. In fact, some published classifiers may be misleading. The application of our approach is simple and requires only the confusion matrix. It is agnostic of the underlying classifier. Our method can also be used for the estimation of sample sizes that achieve a desired precision of a performance metric.
分类器通常在相对较小的数据集上进行测试,这会导致性能指标存在不确定性。然而,这些指标通常被直接接受。我们提出了一种基于混淆矩阵概率模型来量化分类性能指标不确定性的方法。将我们的方法应用于科学文献中的分类器和一场分类竞赛中发现,不确定性可能大得出奇,并限制了性能评估。事实上,一些已发表的分类器可能会产生误导。我们的方法应用简单,只需要混淆矩阵。它与底层分类器无关。我们的方法还可用于估计实现所需性能指标精度的样本量。