Takahashi Kanae, Yamamoto Kouji, Kuchiba Aya, Koyama Tatsuki
Department of Medical Statistics, Osaka City University Graduate School of Medicine, Osaka, Japan.
Department of Biostatistics, Hyogo College of Medicine, Hyogo, Japan.
Appl Intell (Dordr). 2022 Mar;52(5):4961-4972. doi: 10.1007/s10489-021-02635-5. Epub 2021 Jul 31.
A binary classification problem is common in medical field, and we often use sensitivity, specificity, accuracy, negative and positive predictive values as measures of performance of a binary predictor. In computer science, a classifier is usually evaluated with precision (positive predictive value) and recall (sensitivity). As a single summary measure of a classifier's performance, score, defined as the harmonic mean of precision and recall, is widely used in the context of information retrieval and information extraction evaluation since it possesses favorable characteristics, especially when the prevalence is low. Some statistical methods for inference have been developed for the score in binary classification problems; however, they have not been extended to the problem of multi-class classification. There are three types of scores, and statistical properties of these scores have hardly ever been discussed. We propose methods based on the large sample multivariate central limit theorem for estimating scores with confidence intervals.
二元分类问题在医学领域很常见,我们经常使用灵敏度、特异度、准确度、阴性预测值和阳性预测值作为二元预测器性能的度量指标。在计算机科学中,分类器通常用精确率(阳性预测值)和召回率(灵敏度)来评估。作为分类器性能的单一汇总度量指标,F1分数定义为精确率和召回率的调和平均数,由于它具有良好的特性,特别是在患病率较低的情况下,因此在信息检索和信息提取评估中被广泛使用。针对二元分类问题中的F1分数,已经开发了一些用于推断的统计方法;然而,它们尚未扩展到多类分类问题。有三种类型的F1分数,而这些F1分数的统计特性几乎从未被讨论过。我们提出了基于大样本多元中心极限定理的方法,用于估计带有置信区间的F1分数。