Chicco Davide, Tötsch Niklas, Jurman Giuseppe
Krembil Research Institute, Toronto, Ontario, Canada.
Universität Duisburg-Essen, Essen, Germany.
BioData Min. 2021 Feb 4;14(1):13. doi: 10.1186/s13040-021-00244-z.
Evaluating binary classifications is a pivotal task in statistics and machine learning, because it can influence decisions in multiple areas, including for example prognosis or therapies of patients in critical conditions. The scientific community has not agreed on a general-purpose statistical indicator for evaluating two-class confusion matrices (having true positives, true negatives, false positives, and false negatives) yet, even if advantages of the Matthews correlation coefficient (MCC) over accuracy and F score have already been shown.In this manuscript, we reaffirm that MCC is a robust metric that summarizes the classifier performance in a single value, if positive and negative cases are of equal importance. We compare MCC to other metrics which value positive and negative cases equally: balanced accuracy (BA), bookmaker informedness (BM), and markedness (MK). We explain the mathematical relationships between MCC and these indicators, then show some use cases and a bioinformatics scenario where these metrics disagree and where MCC generates a more informative response.Additionally, we describe three exceptions where BM can be more appropriate: analyzing classifications where dataset prevalence is unrepresentative, comparing classifiers on different datasets, and assessing the random guessing level of a classifier. Except in these cases, we believe that MCC is the most informative among the single metrics discussed, and suggest it as standard measure for scientists of all fields. A Matthews correlation coefficient close to +1, in fact, means having high values for all the other confusion matrix metrics. The same cannot be said for balanced accuracy, markedness, bookmaker informedness, accuracy and F score.
评估二元分类是统计学和机器学习中的一项关键任务,因为它会影响多个领域的决策,例如危急情况下患者的预后或治疗。科学界尚未就用于评估二类混淆矩阵(包含真阳性、真阴性、假阳性和假阴性)的通用统计指标达成共识,即便马修斯相关系数(MCC)相对于准确率和F分数的优势已经得到证实。在本论文中,我们重申,如果阳性和阴性病例同等重要,MCC是一种强大的指标,它能将分类器性能总结为一个单一值。我们将MCC与其他对阳性和阴性病例同等重视的指标进行比较:平衡准确率(BA)、庄家知情度(BM)和标记度(MK)。我们解释了MCC与这些指标之间的数学关系,然后展示了一些用例以及一个生物信息学场景,在这些场景中这些指标存在分歧,而MCC能给出更具信息量的响应。此外,我们描述了BM可能更合适的三种例外情况:分析数据集患病率不具代表性的分类、比较不同数据集上的分类器以及评估分类器的随机猜测水平。除了这些情况,我们认为MCC是所讨论的单一指标中信息量最大的,并建议将其作为所有领域科学家的标准度量。事实上,马修斯相关系数接近+1意味着所有其他混淆矩阵指标都具有高值。对于平衡准确率、标记度、庄家知情度、准确率和F分数则并非如此。