避免机器学习中的过度简化：超越类别预测准确率

Avoid Oversimplifications in Machine Learning: Going beyond the Class-Prediction Accuracy.

作者信息

Ho Sung Yang, Wong Limsoon, Goh Wilson Wen Bin

机构信息

School of Biological Sciences, Nanyang Technological University, Singapore 637551, Singapore.

Department of Computer Science, National University of Singapore, Singapore 117417, Singapore.

出版信息

Patterns (N Y). 2020 May 8;1(2):100025. doi: 10.1016/j.patter.2020.100025.

DOI:10.1016/j.patter.2020.100025

PMID:33205097

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7660406/

Abstract

Class-prediction accuracy provides a quick but superficial way of determining classifier performance. It does not inform on the reproducibility of the findings or whether the selected or constructed features used are meaningful and specific. Furthermore, the class-prediction accuracy oversummarizes and does not inform on how training and learning have been accomplished: two classifiers providing the same performance in one validation can disagree on many future validations. It does not provide explainability in its decision-making process and is not objective, as its value is also affected by class proportions in the validation set. Despite these issues, this does not mean we should omit the class-prediction accuracy. Instead, it needs to be enriched with accompanying evidence and tests that supplement and contextualize the reported accuracy. This additional evidence serves as augmentations and can help us perform machine learning better while avoiding naive reliance on oversimplified metrics.

摘要

类别预测准确率提供了一种快速但表面的确定分类器性能的方法。它没有说明研究结果的可重复性，也没有说明所使用的选定或构建的特征是否有意义和具有特异性。此外，类别预测准确率过于概括，没有说明训练和学习是如何完成的：两个在一次验证中表现相同的分类器在许多未来验证中可能会出现分歧。它在决策过程中不提供可解释性，也不客观，因为其值也受验证集中类别比例的影响。尽管存在这些问题，但这并不意味着我们应该忽略类别预测准确率。相反，它需要用补充和解释所报告准确率的伴随证据和测试来丰富。这些额外的证据作为增强因素，可以帮助我们更好地进行机器学习，同时避免对过于简化的指标的盲目依赖。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6a00/7660406/f9557e0da735/fx1.jpg

相似文献

Avoid Oversimplifications in Machine Learning: Going beyond the Class-Prediction Accuracy.避免机器学习中的过度简化：超越类别预测准确率

Patterns (N Y). 2020 May 8;1(2):100025. doi: 10.1016/j.patter.2020.100025.

Artificial Intelligence-Based Traditional Chinese Medicine Assistive Diagnostic System: Validation Study.基于人工智能的中医辅助诊断系统：验证研究。

JMIR Med Inform. 2020 Jun 15;8(6):e17608. doi: 10.2196/17608.

Evaluation of performance metrics for histopathological image classifier optimization.用于组织病理学图像分类器优化的性能指标评估。

Annu Int Conf IEEE Eng Med Biol Soc. 2014;2014:1933-6. doi: 10.1109/EMBC.2014.6943990.

Impact of Machine Learning With Multiparametric Magnetic Resonance Imaging of the Breast for Early Prediction of Response to Neoadjuvant Chemotherapy and Survival Outcomes in Breast Cancer Patients.机器学习联合乳腺多参数磁共振成像对乳腺癌新辅助化疗早期疗效及生存预后评估的影响。

Invest Radiol. 2019 Feb;54(2):110-117. doi: 10.1097/RLI.0000000000000518.

Maximizing lipocalin prediction through balanced and diversified training set and decision fusion.通过平衡且多样化的训练集和决策融合实现脂蛋白预测最大化。

Comput Biol Chem. 2015 Dec;59 Pt A:101-10. doi: 10.1016/j.compbiolchem.2015.09.011. Epub 2015 Sep 28.

Training based on ligand efficiency improves prediction of bioactivities of ligands and drug target proteins in a machine learning approach.基于配体效率的训练可以提高机器学习方法中配体和药物靶标蛋白生物活性预测的准确性。

J Chem Inf Model. 2013 Oct 28;53(10):2525-37. doi: 10.1021/ci400240u. Epub 2013 Sep 24.

Clear Cell Renal Cell Carcinoma: Machine Learning-Based Quantitative Computed Tomography Texture Analysis for Prediction of Fuhrman Nuclear Grade.透明细胞肾细胞癌：基于机器学习的定量 CT 纹理分析预测 Fuhrman 核分级。

Eur Radiol. 2019 Mar;29(3):1153-1163. doi: 10.1007/s00330-018-5698-2. Epub 2018 Aug 30.

Prediction of myopia development among Chinese school-aged children using refraction data from electronic medical records: A retrospective, multicentre machine learning study.基于电子病历中的屈光数据预测中国学龄儿童近视进展：一项回顾性、多中心机器学习研究。

PLoS Med. 2018 Nov 6;15(11):e1002674. doi: 10.1371/journal.pmed.1002674. eCollection 2018 Nov.

Machine learning assessment of myocardial ischemia using angiography: Development and retrospective validation.基于造影的机器学习评估心肌缺血：开发与回顾性验证。

PLoS Med. 2018 Nov 13;15(11):e1002693. doi: 10.1371/journal.pmed.1002693. eCollection 2018 Nov.

A hierarchical anatomical classification schema for prediction of phenotypic side effects.用于预测表型副作用的分层解剖分类方案。

PLoS One. 2018 Mar 1;13(3):e0193959. doi: 10.1371/journal.pone.0193959. eCollection 2018.

引用本文的文献

Ten quick tips for ensuring machine learning model validity.确保机器学习模型有效性的十个快速技巧。

PLoS Comput Biol. 2024 Sep 19;20(9):e1012402. doi: 10.1371/journal.pcbi.1012402. eCollection 2024 Sep.

Development and validation of radiomics models for the prediction of diagnosis of classic trigeminal neuralgia.用于预测经典三叉神经痛诊断的影像组学模型的开发与验证

Front Neurosci. 2023 Oct 9;17:1188590. doi: 10.3389/fnins.2023.1188590. eCollection 2023.

Systems Biology and Omics Approaches for Complex Human Diseases.系统生物学与组学方法在复杂人类疾病中的应用。

Biomolecules. 2023 Jul 6;13(7):1080. doi: 10.3390/biom13071080.

Awareness: An empirical model.意识：一种实证模型。

Front Psychol. 2022 Dec 9;13:933183. doi: 10.3389/fpsyg.2022.933183. eCollection 2022.

Role of brain 2-[F]fluoro-2-deoxy-D-glucose-positron-emission tomography as survival predictor in amyotrophic lateral sclerosis.脑 2-[F]氟代-2-脱氧-D-葡萄糖正电子发射断层扫描在肌萎缩侧索硬化症中作为生存预测指标的作用。

Eur J Nucl Med Mol Imaging. 2023 Feb;50(3):784-791. doi: 10.1007/s00259-022-05987-3. Epub 2022 Oct 29.

Artificial Intelligence in Orthopedic Radiography Analysis: A Narrative Review.骨科放射学分析中的人工智能：一项叙述性综述。

Diagnostics (Basel). 2022 Sep 16;12(9):2235. doi: 10.3390/diagnostics12092235.

Editorial: Prediction and explanation in biomedicine using network-based approaches.社论：使用基于网络的方法进行生物医学中的预测与解释

Front Genet. 2022 Sep 2;13:967936. doi: 10.3389/fgene.2022.967936. eCollection 2022.

Doppelgänger spotting in biomedical gene expression data.生物医学基因表达数据中的“分身”识别

iScience. 2022 Jul 19;25(8):104788. doi: 10.1016/j.isci.2022.104788. eCollection 2022 Aug 19.

Detecting Unusual Intravenous Infusion Alerting Patterns with Machine Learning Algorithms.使用机器学习算法检测异常静脉输液告警模式。

Biomed Instrum Technol. 2022 Apr 1;56(2):58-70. doi: 10.2345/0899-8205-56.2.58.

Systems serology detects functionally distinct coronavirus antibody features in children and elderly.系统血清学在儿童和老年人中检测到功能不同的冠状病毒抗体特征。

Nat Commun. 2021 Apr 1;12(1):2037. doi: 10.1038/s41467-021-22236-7.

本文引用的文献

Proteomic investigation of intra-tumor heterogeneity using network-based contextualization - A case study on prostate cancer.基于网络语境化的肿瘤内异质性的蛋白质组学研究——以前列腺癌为例。

J Proteomics. 2019 Aug 30;206:103446. doi: 10.1016/j.jprot.2019.103446. Epub 2019 Jul 16.

Redefine statistical significance.重新定义统计学显著性。

Nat Hum Behav. 2018 Jan;2(1):6-10. doi: 10.1038/s41562-017-0189-z.

Can Peripheral Blood-Derived Gene Expressions Characterize Individuals at Ultra-high Risk for Psychosis?外周血来源的基因表达能否表征处于超高精神分裂症风险的个体？

Comput Psychiatr. 2017 Dec 1;1:168-183. doi: 10.1162/CPSY_a_00007. eCollection 2017 Dec.

Turning straw into gold: building robustness into gene signature inference.点石成金：提高基因特征推断稳健性。

Drug Discov Today. 2019 Jan;24(1):31-36. doi: 10.1016/j.drudis.2018.08.002. Epub 2018 Aug 4.

Why breast cancer signatures are no better than random signatures explained.为什么乳腺癌特征与随机特征并无差异得到解释。

Drug Discov Today. 2018 Nov;23(11):1818-1823. doi: 10.1016/j.drudis.2018.05.036. Epub 2018 Jun 1.

Dealing with Confounders in Omics Analysis.处理组学分析中的混杂因素。

Trends Biotechnol. 2018 May;36(5):488-498. doi: 10.1016/j.tibtech.2018.01.013. Epub 2018 Feb 20.

Stress and stability: applying the Anna Karenina principle to animal microbiomes.压力与稳定：应用《安娜·卡列尼娜》原则于动物微生物组。

Nat Microbiol. 2017 Aug 24;2:17121. doi: 10.1038/nmicrobiol.2017.121.

Why Batch Effects Matter in Omics Data, and How to Avoid Them.为什么组间效应在组学数据中很重要，以及如何避免它们。

Trends Biotechnol. 2017 Jun;35(6):498-507. doi: 10.1016/j.tibtech.2017.02.012. Epub 2017 Mar 25.

Using and understanding cross-validation strategies. Perspectives on Saeb et al.使用和理解交叉验证策略。Saeb 等人的观点。

Gigascience. 2017 May 1;6(5):1-6. doi: 10.1093/gigascience/gix020.

The application of principal component analysis to drug discovery and biomedical data.主成分分析在药物发现和生物医学数据中的应用。

Drug Discov Today. 2017 Jul;22(7):1069-1076. doi: 10.1016/j.drudis.2017.01.005. Epub 2017 Jan 19.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

避免机器学习中的过度简化：超越类别预测准确率

Avoid Oversimplifications in Machine Learning: Going beyond the Class-Prediction Accuracy.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献