100% 的分类准确率可能有害：归一化信息传递因子解释了准确率悖论。

100% classification accuracy considered harmful: the normalized information transfer factor explains the accuracy paradox.

机构信息

Departamento de Lenguajes y Sistemas Informáticos, Universidad Nacional de Educación a Distancia, Madrid, Spain.

Signal Theory and Communications Department, University Carlos III Madrid, Madrid, Spain.

出版信息

PLoS One. 2014 Jan 10;9(1):e84217. doi: 10.1371/journal.pone.0084217. eCollection 2014.

DOI:10.1371/journal.pone.0084217

PMID:24427282

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3888391/

Abstract

The most widely spread measure of performance, accuracy, suffers from a paradox: predictive models with a given level of accuracy may have greater predictive power than models with higher accuracy. Despite optimizing classification error rate, high accuracy models may fail to capture crucial information transfer in the classification task. We present evidence of this behavior by means of a combinatorial analysis where every possible contingency matrix of 2, 3 and 4 classes classifiers are depicted on the entropy triangle, a more reliable information-theoretic tool for classification assessment. Motivated by this, we develop from first principles a measure of classification performance that takes into consideration the information learned by classifiers. We are then able to obtain the entropy-modulated accuracy (EMA), a pessimistic estimate of the expected accuracy with the influence of the input distribution factored out, and the normalized information transfer factor (NIT), a measure of how efficient is the transmission of information from the input to the output set of classes. The EMA is a more natural measure of classification performance than accuracy when the heuristic to maximize is the transfer of information through the classifier instead of classification error count. The NIT factor measures the effectiveness of the learning process in classifiers and also makes it harder for them to "cheat" using techniques like specialization, while also promoting the interpretability of results. Their use is demonstrated in a mind reading task competition that aims at decoding the identity of a video stimulus based on magnetoencephalography recordings. We show how the EMA and the NIT factor reject rankings based in accuracy, choosing more meaningful and interpretable classifiers.

摘要

应用最广泛的性能衡量标准——准确性，存在一个悖论：具有给定精度水平的预测模型可能比精度更高的模型具有更大的预测能力。尽管优化了分类错误率，高精度模型可能无法捕获分类任务中的关键信息传递。我们通过组合分析证明了这种行为，在该分析中，描绘了 2、3 和 4 类分类器的每个可能的 contingency 矩阵在熵三角形上，这是一种更可靠的分类评估信息论工具。受此启发，我们从第一性原理出发，开发了一种分类性能衡量标准，该标准考虑了分类器学到的信息。然后，我们能够获得熵调制准确性（EMA），这是一种在排除输入分布影响的情况下对预期准确性的悲观估计，以及归一化信息传递因子（NIT），这是衡量信息从输入到类别的输出集传输效率的一种方法。当启发式方法是通过分类器而不是分类错误计数来最大化信息传输时，EMA 是比准确性更自然的分类性能衡量标准。NIT 因子衡量分类器中学习过程的有效性，并且还使它们更难使用专门化等技术“作弊”，同时还促进了结果的可解释性。它们在一项思维阅读任务竞赛中得到了应用，该竞赛旨在基于脑磁图记录解码视频刺激的身份。我们展示了 EMA 和 NIT 因子如何拒绝基于准确性的排名，选择更有意义和可解释的分类器。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/156b/3888391/0862504d5227/pone.0084217.g001.jpg

相似文献

100% classification accuracy considered harmful: the normalized information transfer factor explains the accuracy paradox.100% 的分类准确率可能有害：归一化信息传递因子解释了准确率悖论。

PLoS One. 2014 Jan 10;9(1):e84217. doi: 10.1371/journal.pone.0084217. eCollection 2014.

Multi-objective evolutionary algorithms for fuzzy classification in survival prediction.多目标进化算法在生存预测中的模糊分类。

Artif Intell Med. 2014 Mar;60(3):197-219. doi: 10.1016/j.artmed.2013.12.006. Epub 2014 Jan 9.

Improving data retrieval quality: Evidence based medicine perspective.提高数据检索质量：循证医学视角

Int J Risk Saf Med. 2015;27 Suppl 1:S106-7. doi: 10.3233/JRS-150710.

Instance transfer learning with multisource dynamic TrAdaBoost.基于多源动态TrAdaBoost的实例迁移学习

ScientificWorldJournal. 2014;2014:282747. doi: 10.1155/2014/282747. Epub 2014 Jul 24.

Benchmarking Analysis of the Accuracy of Classification Methods Related to Entropy.与熵相关的分类方法准确性的基准分析

Entropy (Basel). 2021 Jul 1;23(7):850. doi: 10.3390/e23070850.

Early prediction of radiotherapy-induced parotid shrinkage and toxicity based on CT radiomics and fuzzy classification.基于 CT 放射组学和模糊分类的放疗后腮腺早期收缩和毒性的预测。

Artif Intell Med. 2017 Sep;81:41-53. doi: 10.1016/j.artmed.2017.03.004. Epub 2017 Mar 18.

Local classifier weighting by quadratic programming.基于二次规划的局部分类器加权

IEEE Trans Neural Netw. 2008 Oct;19(10):1832-8. doi: 10.1109/TNN.2008.2005301.

Exceeding chance level by chance: The caveat of theoretical chance levels in brain signal classification and statistical assessment of decoding accuracy.偶然超过机遇水平：脑信号分类中理论机遇水平的注意事项及解码准确性的统计评估

J Neurosci Methods. 2015 Jul 30;250:126-36. doi: 10.1016/j.jneumeth.2015.01.010. Epub 2015 Jan 14.

Cascaded classifiers and stacking methods for classification of pulmonary nodule characteristics.级联分类器和堆叠方法在肺结节特征分类中的应用。

Comput Methods Programs Biomed. 2018 Nov;166:77-89. doi: 10.1016/j.cmpb.2018.10.009. Epub 2018 Oct 3.

Comparison of multivariate classifiers and response normalizations for pattern-information fMRI.基于模式信息的 fMRI 的多变量分类器和响应归一化方法比较。

Neuroimage. 2010 Oct 15;53(1):103-18. doi: 10.1016/j.neuroimage.2010.05.051. Epub 2010 May 23.

引用本文的文献

Machine learning approaches for risk prediction after percutaneous coronary intervention: a systematic review and meta-analysis.经皮冠状动脉介入治疗后风险预测的机器学习方法：系统评价与荟萃分析

Eur Heart J Digit Health. 2024 Oct 14;6(1):23-44. doi: 10.1093/ehjdh/ztae074. eCollection 2025 Jan.

In Vivo Raman Spectroscopy of Muscle Is Highly Sensitive for Detection of Healthy Muscle and Highly Specific for Detection of Disease.体内拉曼光谱对肌肉具有高度敏感性，可用于检测健康肌肉，且具有高度特异性，可用于检测疾病。

Anal Chem. 2024 Oct 8;96(40):15991-15997. doi: 10.1021/acs.analchem.4c03430. Epub 2024 Sep 26.

Machine learning-based prognostic model for 30-day mortality prediction in Sepsis-3.基于机器学习的 Sepsis-3 30 天死亡率预测预后模型。

BMC Med Inform Decis Mak. 2024 Sep 9;24(1):249. doi: 10.1186/s12911-024-02655-4.

Deep learning driven segmentation of maxillary impacted canine on cone beam computed tomography images.基于锥形束 CT 图像的上颌埋伏尖牙的深度学习分割。

Sci Rep. 2024 Jan 3;14(1):369. doi: 10.1038/s41598-023-49613-0.

Heart Failure Emergency Readmission Prediction Using Stacking Machine Learning Model.使用堆叠机器学习模型预测心力衰竭紧急再入院情况

Diagnostics (Basel). 2023 Jun 2;13(11):1948. doi: 10.3390/diagnostics13111948.

Predicting Bitcoin Prices Using Machine Learning.使用机器学习预测比特币价格。

Entropy (Basel). 2023 May 10;25(5):777. doi: 10.3390/e25050777.

The prognostic value of gastroesophageal reflux disorder in interstitial lung disease related hospitalizations.胃食管反流病在间质性肺疾病相关住院患者中的预后价值。

Respir Res. 2023 Mar 30;24(1):97. doi: 10.1186/s12931-023-02407-4.

Machine learning models for 180-day mortality prediction of patients with advanced cancer using patient-reported symptom data.使用患者报告的症状数据对晚期癌症患者进行 180 天死亡率预测的机器学习模型。

Qual Life Res. 2023 Mar;32(3):713-727. doi: 10.1007/s11136-022-03284-y. Epub 2022 Oct 29.

PToPI: A Comprehensive Review, Analysis, and Knowledge Representation of Binary Classification Performance Measures/Metrics.PToPI：二元分类性能度量/指标的全面综述、分析与知识表示

SN Comput Sci. 2023;4(1):13. doi: 10.1007/s42979-022-01409-1. Epub 2022 Oct 16.

A voting-based ensemble feature network for semiconductor wafer defect classification.基于投票的集成特征网络在半导体晶圆缺陷分类中的应用。

Sci Rep. 2022 Sep 28;12(1):16254. doi: 10.1038/s41598-022-20630-9.

本文引用的文献

A comparison of MCC and CEN error measures in multi-class prediction.多类预测中 MCC 和 CEN 误差度量的比较。

PLoS One. 2012;7(8):e41882. doi: 10.1371/journal.pone.0041882. Epub 2012 Aug 8.

Bayesian approach to transforming public gene expression repositories into disease diagnosis databases.贝叶斯方法将公共基因表达库转化为疾病诊断数据库。

Proc Natl Acad Sci U S A. 2010 Apr 13;107(15):6823-8. doi: 10.1073/pnas.0912043107. Epub 2010 Apr 1.

Exploring the within- and between-class correlation distributions for tumor classification.探讨肿瘤分类的类内和类间相关系数分布。

Proc Natl Acad Sci U S A. 2010 Apr 13;107(15):6737-42. doi: 10.1073/pnas.0910140107. Epub 2010 Mar 25.

Classification: purposes, principles, progress, prospects.分类：目的、原则、进展、前景。

Science. 1974 Sep 27;185(4157):1115-23. doi: 10.1126/science.185.4157.1115.

Predicting the clinical status of human breast cancer by using gene expression profiles.利用基因表达谱预测人类乳腺癌的临床状态。

Proc Natl Acad Sci U S A. 2001 Sep 25;98(20):11462-7. doi: 10.1073/pnas.201162998. Epub 2001 Sep 18.

Measuring the accuracy of diagnostic systems.测量诊断系统的准确性。

Science. 1988 Jun 3;240(4857):1285-93. doi: 10.1126/science.3287615.

Comparison of the predicted and observed secondary structure of T4 phage lysozyme.T4噬菌体溶菌酶预测二级结构与观察到的二级结构的比较。

Biochim Biophys Acta. 1975 Oct 20;405(2):442-51. doi: 10.1016/0005-2795(75)90109-9.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

100% 的分类准确率可能有害：归一化信息传递因子解释了准确率悖论。

100% classification accuracy considered harmful: the normalized information transfer factor explains the accuracy paradox.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献