在二分类混淆矩阵评估中，马修斯相关系数（MCC）比平衡准确率、庄家知情度和标记度更可靠。

The Matthews correlation coefficient (MCC) is more reliable than balanced accuracy, bookmaker informedness, and markedness in two-class confusion matrix evaluation.

作者信息

Chicco Davide, Tötsch Niklas, Jurman Giuseppe

机构信息

Krembil Research Institute, Toronto, Ontario, Canada.

Universität Duisburg-Essen, Essen, Germany.

出版信息

BioData Min. 2021 Feb 4;14(1):13. doi: 10.1186/s13040-021-00244-z.

DOI:10.1186/s13040-021-00244-z

PMID:33541410

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7863449/

Abstract

Evaluating binary classifications is a pivotal task in statistics and machine learning, because it can influence decisions in multiple areas, including for example prognosis or therapies of patients in critical conditions. The scientific community has not agreed on a general-purpose statistical indicator for evaluating two-class confusion matrices (having true positives, true negatives, false positives, and false negatives) yet, even if advantages of the Matthews correlation coefficient (MCC) over accuracy and F score have already been shown.In this manuscript, we reaffirm that MCC is a robust metric that summarizes the classifier performance in a single value, if positive and negative cases are of equal importance. We compare MCC to other metrics which value positive and negative cases equally: balanced accuracy (BA), bookmaker informedness (BM), and markedness (MK). We explain the mathematical relationships between MCC and these indicators, then show some use cases and a bioinformatics scenario where these metrics disagree and where MCC generates a more informative response.Additionally, we describe three exceptions where BM can be more appropriate: analyzing classifications where dataset prevalence is unrepresentative, comparing classifiers on different datasets, and assessing the random guessing level of a classifier. Except in these cases, we believe that MCC is the most informative among the single metrics discussed, and suggest it as standard measure for scientists of all fields. A Matthews correlation coefficient close to +1, in fact, means having high values for all the other confusion matrix metrics. The same cannot be said for balanced accuracy, markedness, bookmaker informedness, accuracy and F score.

摘要

评估二元分类是统计学和机器学习中的一项关键任务，因为它会影响多个领域的决策，例如危急情况下患者的预后或治疗。科学界尚未就用于评估二类混淆矩阵（包含真阳性、真阴性、假阳性和假阴性）的通用统计指标达成共识，即便马修斯相关系数（MCC）相对于准确率和F分数的优势已经得到证实。在本论文中，我们重申，如果阳性和阴性病例同等重要，MCC是一种强大的指标，它能将分类器性能总结为一个单一值。我们将MCC与其他对阳性和阴性病例同等重视的指标进行比较：平衡准确率（BA）、庄家知情度（BM）和标记度（MK）。我们解释了MCC与这些指标之间的数学关系，然后展示了一些用例以及一个生物信息学场景，在这些场景中这些指标存在分歧，而MCC能给出更具信息量的响应。此外，我们描述了BM可能更合适的三种例外情况：分析数据集患病率不具代表性的分类、比较不同数据集上的分类器以及评估分类器的随机猜测水平。除了这些情况，我们认为MCC是所讨论的单一指标中信息量最大的，并建议将其作为所有领域科学家的标准度量。事实上，马修斯相关系数接近+1意味着所有其他混淆矩阵指标都具有高值。对于平衡准确率、标记度、庄家知情度、准确率和F分数则并非如此。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7960/7863449/ef34245a7b3d/13040_2021_244_Fig1_HTML.jpg

相似文献

The Matthews correlation coefficient (MCC) is more reliable than balanced accuracy, bookmaker informedness, and markedness in two-class confusion matrix evaluation.

BioData Min. 2021 Feb 4;14(1):13. doi: 10.1186/s13040-021-00244-z.

The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation.

BMC Genomics. 2020 Jan 2;21(1):6. doi: 10.1186/s12864-019-6413-7.

A statistical comparison between Matthews correlation coefficient (MCC), prevalence threshold, and Fowlkes-Mallows index.

J Biomed Inform. 2023 Aug;144:104426. doi: 10.1016/j.jbi.2023.104426. Epub 2023 Jun 21.

Mind your prevalence!

J Cheminform. 2024 Apr 15;16(1):43. doi: 10.1186/s13321-024-00837-w.

The Matthews correlation coefficient (MCC) should replace the ROC AUC as the standard metric for assessing binary classification.

BioData Min. 2023 Feb 17;16(1):4. doi: 10.1186/s13040-023-00322-4.

Prediction of low Apgar score at five minutes following labor induction intervention in vaginal deliveries: machine learning approach for imbalanced data at a tertiary hospital in North Tanzania.

BMC Pregnancy Childbirth. 2022 Apr 1;22(1):275. doi: 10.1186/s12884-022-04534-0.

Optimal classifier for imbalanced data using Matthews Correlation Coefficient metric.

PLoS One. 2017 Jun 2;12(6):e0177678. doi: 10.1371/journal.pone.0177678. eCollection 2017.

Challenges in the real world use of classification accuracy metrics: From recall and precision to the Matthews correlation coefficient.

PLoS One. 2023 Oct 4;18(10):e0291908. doi: 10.1371/journal.pone.0291908. eCollection 2023.

Why Cohen's Kappa should be avoided as performance measure in classification.

PLoS One. 2019 Sep 26;14(9):e0222916. doi: 10.1371/journal.pone.0222916. eCollection 2019.

A misbehavior detection framework for cooperative intelligent transport systems.

ISA Trans. 2023 Jan;132:52-60. doi: 10.1016/j.isatra.2022.08.029. Epub 2022 Sep 16.

引用本文的文献

Trait mediation explains decadal distributional shifts for a wide range of insect taxa.

Nat Commun. 2025 Aug 30;16(1):8131. doi: 10.1038/s41467-025-63093-y.

Improving classification on imbalanced genomic data via KDE-based synthetic sampling.

BioData Min. 2025 Aug 29;18(1):60. doi: 10.1186/s13040-025-00474-5.

Enhanced detection of patients with previous COVID-19: superiority of the double diffusion technique.

BMJ Open Respir Res. 2025 Aug 25;12(1):e002561. doi: 10.1136/bmjresp-2024-002561.

Unveiling Novel Arginase Inhibitors for Cutaneous Leishmaniasis Using Drug Repurposing and Virtual Screening Approaches.

J Cell Biochem. 2025 Aug;126(8):e70060. doi: 10.1002/jcb.70060.

Pulmonary diffusing capacity and dyspnoea following COVID-19: Insights from multicentre datasets.

Data Brief. 2025 Jul 25;62:111925. doi: 10.1016/j.dib.2025.111925. eCollection 2025 Oct.

Exo-Tox: Identifying Exotoxins from secreted bacterial proteins.

BioData Min. 2025 Aug 8;18(1):52. doi: 10.1186/s13040-025-00469-2.

Machine Learning and Artificial Intelligence for Infectious Disease Surveillance, Diagnosis, and Prognosis.

Viruses. 2025 Jun 23;17(7):882. doi: 10.3390/v17070882.

Target Mapping in Cancer: Ligandable Protein Pockets on 3D OncoPPI Networks.

Pharmaceuticals (Basel). 2025 Jun 25;18(7):958. doi: 10.3390/ph18070958.

The Use of Selected Machine Learning Methods in Dairy Cattle Farming: A Review.

Animals (Basel). 2025 Jul 10;15(14):2033. doi: 10.3390/ani15142033.

AllerTrans: a deep learning method for predicting the allergenicity of protein sequences.

Biol Methods Protoc. 2025 Jul 9;10(1):bpaf040. doi: 10.1093/biomethods/bpaf040. eCollection 2025.

本文引用的文献

Classifier uncertainty: evidence, potential impact, and probabilistic treatment.

PeerJ Comput Sci. 2021 Mar 4;7:e398. doi: 10.7717/peerj-cs.398. eCollection 2021.

H-Accuracy, an Alternative Metric to Assess Classification Models in Medicine.

Stud Health Technol Inform. 2020 Jun 16;270:242-246. doi: 10.3233/SHTI200159.

Accuracy of MRI Classification Algorithms in a Tertiary Memory Center Clinical Routine Cohort.

J Alzheimers Dis. 2020;74(4):1157-1166. doi: 10.3233/JAD-190594.

The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation.

BMC Genomics. 2020 Jan 2;21(1):6. doi: 10.1186/s12864-019-6413-7.

Multi-Level Comparison of Machine Learning Classifiers and Their Performance Metrics.

Molecules. 2019 Aug 1;24(15):2811. doi: 10.3390/molecules24152811.

Estimating classification accuracy in positive-unlabeled learning: characterization and correction strategies.

Pac Symp Biocomput. 2019;24:124-135.

PMT: New analytical framework for automated evaluation of geo-environmental modelling approaches.

Sci Total Environ. 2019 May 10;664:296-311. doi: 10.1016/j.scitotenv.2019.02.017. Epub 2019 Feb 2.

The generalisability of artificial neural networks used to classify electrophoretic data produced under different conditions.

Forensic Sci Int Genet. 2019 Jan;38:181-184. doi: 10.1016/j.fsigen.2018.10.019. Epub 2018 Nov 2.

Peering Into the Black Box of Artificial Intelligence: Evaluation Metrics of Machine Learning Methods.

AJR Am J Roentgenol. 2019 Jan;212(1):38-43. doi: 10.2214/AJR.18.20224. Epub 2018 Oct 17.

Ten quick tips for machine learning in computational biology.

BioData Min. 2017 Dec 8;10:35. doi: 10.1186/s13040-017-0155-3. eCollection 2017.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

在二分类混淆矩阵评估中，马修斯相关系数（MCC）比平衡准确率、庄家知情度和标记度更可靠。

The Matthews correlation coefficient (MCC) is more reliable than balanced accuracy, bookmaker informedness, and markedness in two-class confusion matrix evaluation.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献