• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

评估影响生物学中机器学习分类器使用的准确性、可解释性和可重复性的因素,以实现标准化。

Evaluating the factors influencing accuracy, interpretability, and reproducibility in the use of machine learning classifiers in biology to enable standardization.

作者信息

Martinez Kaitlyn M, Wilding Kristen, Llewellyn Trent R, Jacobsen Daniel E, Montoya Makaela M, Kubicek-Sutherland Jessica Z, Batni Sweta, Manore Carrie, Mukundan Harshini

机构信息

A-1 Information Systems and Modeling, Los Alamos National Laboratory, Los Alamos, NM, United States of America.

T-6 Theoretical Biology and Biophysics, Los Alamos National Laboratory, Los Alamos, NM, United States of America.

出版信息

Sci Rep. 2025 May 13;15(1):16651. doi: 10.1038/s41598-025-00245-6.

DOI:10.1038/s41598-025-00245-6
PMID:40360553
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12075784/
Abstract

The complexity and variability of biological data has promoted the increased use of machine learning methods to understand processes and predict outcomes. These same features complicate reliable, reproducible, interpretable, and responsible use of such methods, resulting in questionable relevance of the derived. outcomes. Here we systematically explore challenges associated with applying machine learning to predict and understand biological processes using a well- characterized in vitro experimental system. We evaluated factors that vary while applying machine learning classifers: (1) type of biochemical signature (transcripts vs. proteins), (2) data curation methods (pre- and post-processing), and (3) choice of machine learning classifier. Using accuracy, generalizability, interpretability, and reproducibility as metrics, we found that the above factors significantly mod- ulate outcomes even within a simple model system. Our results caution against the unregulated use of machine learning methods in the biological sciences, and strongly advocate the need for data standards and validation tool-kits for such studies.

摘要

生物数据的复杂性和变异性促使人们更多地使用机器学习方法来理解生物过程并预测结果。同样这些特征也使得此类方法的可靠、可重复、可解释和负责任的使用变得复杂,导致所推导结果的相关性存疑。在此,我们使用一个特征明确的体外实验系统,系统地探讨了将机器学习应用于预测和理解生物过程时所面临的挑战。我们评估了在应用机器学习分类器时会变化的因素:(1)生化特征类型(转录本与蛋白质),(2)数据处理方法(预处理和后处理),以及(3)机器学习分类器的选择。以准确性、通用性、可解释性和可重复性作为指标,我们发现即使在一个简单的模型系统中,上述因素也会显著调节结果。我们的结果警示人们在生物科学中要避免无节制地使用机器学习方法,并强烈主张为此类研究制定数据标准和验证工具包的必要性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/92ab/12075784/94518092a7f9/41598_2025_245_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/92ab/12075784/a1595bcda736/41598_2025_245_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/92ab/12075784/b7eb6e7af03a/41598_2025_245_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/92ab/12075784/b74dd4ec5472/41598_2025_245_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/92ab/12075784/94518092a7f9/41598_2025_245_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/92ab/12075784/a1595bcda736/41598_2025_245_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/92ab/12075784/b7eb6e7af03a/41598_2025_245_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/92ab/12075784/b74dd4ec5472/41598_2025_245_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/92ab/12075784/94518092a7f9/41598_2025_245_Fig4_HTML.jpg

相似文献

1
Evaluating the factors influencing accuracy, interpretability, and reproducibility in the use of machine learning classifiers in biology to enable standardization.评估影响生物学中机器学习分类器使用的准确性、可解释性和可重复性的因素,以实现标准化。
Sci Rep. 2025 May 13;15(1):16651. doi: 10.1038/s41598-025-00245-6.
2
A critical moment in machine learning in medicine: on reproducible and interpretable learning.医学机器学习的关键时刻:可重现且可解释的学习。
Acta Neurochir (Wien). 2024 Jan 16;166(1):14. doi: 10.1007/s00701-024-05892-8.
3
Predictive modeling and optimization in dermatology: Machine learning for skin disease classification.皮肤病学中的预测建模与优化:用于皮肤疾病分类的机器学习
Comput Biol Med. 2025 May;189:109946. doi: 10.1016/j.compbiomed.2025.109946. Epub 2025 Mar 3.
4
Applying machine learning to predict bowel preparation adequacy in elderly patients for colonoscopy: development and validation of a web-based prediction tool.应用机器学习预测老年患者结肠镜检查肠道准备的充分性:一种基于网络的预测工具的开发与验证
Ann Med. 2025 Dec;57(1):2474172. doi: 10.1080/07853890.2025.2474172. Epub 2025 Mar 11.
5
On the interpretability of machine learning-based model for predicting hypertension.基于机器学习的高血压预测模型的可解释性研究。
BMC Med Inform Decis Mak. 2019 Jul 29;19(1):146. doi: 10.1186/s12911-019-0874-0.
6
A Framework for Effective Application of Machine Learning to Microbiome-Based Classification Problems.基于机器学习的微生物组分类问题的有效应用框架。
mBio. 2020 Jun 9;11(3):e00434-20. doi: 10.1128/mBio.00434-20.
7
Textural differences between renal cell carcinoma subtypes: Machine learning-based quantitative computed tomography texture analysis with independent external validation.肾细胞癌亚型的纹理差异:基于机器学习的定量 CT 纹理分析及独立外部验证。
Eur J Radiol. 2018 Oct;107:149-157. doi: 10.1016/j.ejrad.2018.08.014. Epub 2018 Aug 16.
8
Machine learning-based risk prediction for major adverse cardiovascular events in a Brazilian hospital: Development, external validation, and interpretability.基于机器学习的巴西医院主要不良心血管事件风险预测:开发、外部验证和可解释性。
PLoS One. 2024 Oct 11;19(10):e0311719. doi: 10.1371/journal.pone.0311719. eCollection 2024.
9
Social Reminiscence in Older Adults' Everyday Conversations: Automated Detection Using Natural Language Processing and Machine Learning.老年人日常对话中的社会怀旧:使用自然语言处理和机器学习的自动检测。
J Med Internet Res. 2020 Sep 15;22(9):e19133. doi: 10.2196/19133.
10
Evaluation of machine learning methods for prediction of heart failure mortality and readmission: meta-analysis.用于预测心力衰竭死亡率和再入院的机器学习方法评估:荟萃分析
BMC Cardiovasc Disord. 2025 Apr 7;25(1):264. doi: 10.1186/s12872-025-04700-0.

本文引用的文献

1
Correlating transcription and protein expression profiles of immune biomarkers following lipopolysaccharide exposure in lung epithelial cells.脂多糖暴露后肺上皮细胞中免疫生物标志物的转录与蛋白质表达谱的相关性研究
PLoS One. 2024 Apr 23;19(4):e0293680. doi: 10.1371/journal.pone.0293680. eCollection 2024.
2
High dimensional predictions of suicide risk in 4.2 million US Veterans using ensemble transfer learning.利用集成迁移学习对 420 万美国退伍军人的自杀风险进行高维预测。
Sci Rep. 2024 Jan 20;14(1):1793. doi: 10.1038/s41598-024-51762-9.
3
Challenges in the real world use of classification accuracy metrics: From recall and precision to the Matthews correlation coefficient.
真实世界中分类准确率度量指标的应用挑战:从召回率和准确率到马修斯相关系数。
PLoS One. 2023 Oct 4;18(10):e0291908. doi: 10.1371/journal.pone.0291908. eCollection 2023.
4
Decision trees for early prediction of inadequate immune response to coronavirus infections: a pilot study on COVID-19.用于早期预测对冠状病毒感染免疫反应不足的决策树:一项关于COVID-19的初步研究
Front Med (Lausanne). 2023 Aug 2;10:1230733. doi: 10.3389/fmed.2023.1230733. eCollection 2023.
5
Applications of multi-omics analysis in human diseases.多组学分析在人类疾病中的应用。
MedComm (2020). 2023 Jul 31;4(4):e315. doi: 10.1002/mco2.315. eCollection 2023 Aug.
6
Elastic Net Regularization Paths for All Generalized Linear Models.所有广义线性模型的弹性网络正则化路径
J Stat Softw. 2023;106. doi: 10.18637/jss.v106.i01. Epub 2023 Mar 23.
7
Lipopolysaccharide induces inflammatory microglial activation through CD147-mediated matrix metalloproteinase expression.脂多糖通过 CD147 介导的基质金属蛋白酶表达诱导炎症性小胶质细胞活化。
Environ Sci Pollut Res Int. 2023 Mar;30(12):35352-35365. doi: 10.1007/s11356-022-24292-y. Epub 2022 Dec 19.
8
The Role of Brain-Derived Neurotrophic Factor in Immune-Related Diseases: A Narrative Review.脑源性神经营养因子在免疫相关疾病中的作用:一项叙述性综述。
J Clin Med. 2022 Oct 12;11(20):6023. doi: 10.3390/jcm11206023.
9
Deep neural network modeling identifies biomarkers of response to immune-checkpoint therapy.深度神经网络建模可识别免疫检查点治疗反应的生物标志物。
iScience. 2022 Apr 9;25(5):104228. doi: 10.1016/j.isci.2022.104228. eCollection 2022 May 20.
10
A guide to machine learning for biologists.生物学机器学习指南。
Nat Rev Mol Cell Biol. 2022 Jan;23(1):40-55. doi: 10.1038/s41580-021-00407-0. Epub 2021 Sep 13.