用于基于组学数据的系统生物学预测的可解释机器学习方法。

Interpretable machine learning methods for predictions in systems biology from omics data.

作者信息

Sidak David, Schwarzerová Jana, Weckwerth Wolfram, Waldherr Steffen

机构信息

Department of Functional and Evolutionary Ecology, Faculty of Life Sciences, Molecular Systems Biology (MOSYS), University of Vienna, Vienna, Austria.

Department of Biomedical Engineering, Faculty of Electrical Engineering and Communication, Brno University of Technology, Brno, Czech Republic.

出版信息

Front Mol Biosci. 2022 Oct 17;9:926623. doi: 10.3389/fmolb.2022.926623. eCollection 2022.

DOI:10.3389/fmolb.2022.926623

PMID:36387282

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9650551/

Abstract

Machine learning has become a powerful tool for systems biologists, from diagnosing cancer to optimizing kinetic models and predicting the state, growth dynamics, or type of a cell. Potential predictions from complex biological data sets obtained by "omics" experiments seem endless, but are often not the main objective of biological research. Often we want to understand the molecular mechanisms of a disease to develop new therapies, or we need to justify a crucial decision that is derived from a prediction. In order to gain such knowledge from data, machine learning models need to be extended. A recent trend to achieve this is to design "interpretable" models. However, the notions around interpretability are sometimes ambiguous, and a universal recipe for building well-interpretable models is missing. With this work, we want to familiarize systems biologists with the concept of model interpretability in machine learning. We consider data sets, data preparation, machine learning methods, and software tools relevant to omics research in systems biology. Finally, we try to answer the question: "What is interpretability?" We introduce views from the interpretable machine learning community and propose a scheme for categorizing studies on omics data. We then apply these tools to review and categorize recent studies where predictive machine learning models have been constructed from non-sequential omics data.

摘要

机器学习已成为系统生物学家的强大工具，可用于癌症诊断、优化动力学模型以及预测细胞状态、生长动态或类型。通过“组学”实验获得的复杂生物数据集的潜在预测似乎无穷无尽，但往往并非生物学研究的主要目标。我们常常希望了解疾病的分子机制以开发新疗法，或者需要为基于预测做出的关键决策提供依据。为了从数据中获取此类知识，机器学习模型需要扩展。实现这一目标的最新趋势是设计“可解释”模型。然而，围绕可解释性的概念有时并不明确，且缺少构建良好可解释模型的通用方法。通过这项工作，我们希望让系统生物学家熟悉机器学习中模型可解释性的概念。我们考虑与系统生物学中的组学研究相关的数据集、数据准备、机器学习方法和软件工具。最后，我们尝试回答“什么是可解释性？”这个问题。我们介绍了可解释机器学习领域的观点，并提出了一种对组学数据研究进行分类的方案。然后，我们应用这些工具对近期从非序列组学数据构建预测性机器学习模型的研究进行综述和分类。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/56da/9650551/bc7d066bade5/fmolb-09-926623-g001.jpg

相似文献

Interpretable machine learning methods for predictions in systems biology from omics data.用于基于组学数据的系统生物学预测的可解释机器学习方法。

Front Mol Biosci. 2022 Oct 17;9:926623. doi: 10.3389/fmolb.2022.926623. eCollection 2022.

Interpretable deep learning in single-cell omics.单细胞组学中的可解释深度学习。

Bioinformatics. 2024 Jun 3;40(6). doi: 10.1093/bioinformatics/btae374.

Interpretable meta-learning of multi-omics data for survival analysis and pathway enrichment.基于可解释元学习的多组学生存分析和通路富集

Bioinformatics. 2023 Apr 3;39(4). doi: 10.1093/bioinformatics/btad113.

A Mobile App That Addresses Interpretability Challenges in Machine Learning-Based Diabetes Predictions: Survey-Based User Study.一款应对基于机器学习的糖尿病预测中可解释性挑战的移动应用程序：基于调查的用户研究。

JMIR Form Res. 2023 Nov 13;7:e50328. doi: 10.2196/50328.

Using machine learning approaches for multi-omics data analysis: A review.使用机器学习方法进行多组学数据分析：综述

Biotechnol Adv. 2021 Jul-Aug;49:107739. doi: 10.1016/j.biotechadv.2021.107739. Epub 2021 Mar 29.

Interpretable Decision Sets: A Joint Framework for Description and Prediction.可解释决策集：用于描述与预测的联合框架

KDD. 2016 Aug;2016:1675-1684. doi: 10.1145/2939672.2939874.

Machine learning meets omics: applications and perspectives.机器学习与组学的融合：应用与展望。

Brief Bioinform. 2022 Jan 17;23(1). doi: 10.1093/bib/bbab460.

SMILE: systems metabolomics using interpretable learning and evolution.SMILE：基于可解释学习和进化的系统代谢组学。

BMC Bioinformatics. 2021 May 28;22(1):284. doi: 10.1186/s12859-021-04209-1.

Translational Metabolomics of Head Injury: Exploring Dysfunctional Cerebral Metabolism with Ex Vivo NMR Spectroscopy-Based Metabolite Quantification头部损伤的转化代谢组学：基于体外核磁共振波谱的代谢物定量分析探索脑代谢功能障碍

Machine Learning: A New Prospect in Multi-Omics Data Analysis of Cancer.机器学习：癌症多组学数据分析的新前景。

Front Genet. 2022 Jan 27;13:824451. doi: 10.3389/fgene.2022.824451. eCollection 2022.

引用本文的文献

ATF6 activation alters colonic lipid metabolism causing tumour-associated microbial adaptation.ATF6激活改变结肠脂质代谢，导致肿瘤相关微生物适应。

Nat Metab. 2025 Sep 1. doi: 10.1038/s42255-025-01350-6.

Advances in Functional Genomics for Exploring Abiotic Stress Tolerance Mechanisms in Cereals.探索谷物非生物胁迫耐受机制的功能基因组学进展

Plants (Basel). 2025 Aug 8;14(16):2459. doi: 10.3390/plants14162459.

Artificial intelligence: the human response to approach the complexity of big data in biology.人工智能：人类应对生物学大数据复杂性的方式

Gigascience. 2025 Jan 6;14. doi: 10.1093/gigascience/giaf057.

Modeling the Interactions Between Chemicals and Proteins to Predict the Health Consequences of Air Pollution.模拟化学物质与蛋白质之间的相互作用以预测空气污染对健康的影响。

Int J Environ Res Public Health. 2025 Mar 13;22(3):418. doi: 10.3390/ijerph22030418.

Application of artificial intelligence in the diagnosis of malignant digestive tract tumors: focusing on opportunities and challenges in endoscopy and pathology.人工智能在恶性消化道肿瘤诊断中的应用：聚焦于内镜检查与病理学中的机遇与挑战

J Transl Med. 2025 Apr 9;23(1):412. doi: 10.1186/s12967-025-06428-z.

Editorial for the Special Issue: Bioinformatics and Computational Biology for Cancer Prediction and Prognosis.特刊社论：用于癌症预测和预后的生物信息学与计算生物学

Genes (Basel). 2025 Jan 28;16(2):167. doi: 10.3390/genes16020167.

Sequence-Only Prediction of Super-Enhancers in Human Cell Lines Using Transformer Models.使用Transformer模型对人类细胞系中的超级增强子进行仅序列预测。

Biology (Basel). 2025 Feb 7;14(2):172. doi: 10.3390/biology14020172.

The future of plant lectinology: Advanced technologies and computational tools.植物凝集素学的未来：先进技术与计算工具

BBA Adv. 2025 Jan 28;7:100145. doi: 10.1016/j.bbadva.2025.100145. eCollection 2025.

The impact of dietary fiber on colorectal cancer patients based on machine learning.基于机器学习的膳食纤维对结直肠癌患者的影响

Front Nutr. 2025 Jan 24;12:1508562. doi: 10.3389/fnut.2025.1508562. eCollection 2025.

Demystifying the black box: A survey on explainable artificial intelligence (XAI) in bioinformatics.揭开黑箱之谜：生物信息学中可解释人工智能（XAI）的调查。

Comput Struct Biotechnol J. 2025 Jan 10;27:346-359. doi: 10.1016/j.csbj.2024.12.027. eCollection 2025.

本文引用的文献

Machine learning for multi-omics data integration in cancer.用于癌症多组学数据整合的机器学习

iScience. 2022 Jan 22;25(2):103798. doi: 10.1016/j.isci.2022.103798. eCollection 2022 Feb 18.

MultiCapsNet: A General Framework for Data Integration and Interpretable Classification.多胶囊网络：数据集成与可解释分类的通用框架

Front Genet. 2021 Nov 24;12:767602. doi: 10.3389/fgene.2021.767602. eCollection 2021.

Benchmark of filter methods for feature selection in high-dimensional gene expression survival data.高维基因表达生存数据中特征选择的过滤方法的基准测试。

Brief Bioinform. 2022 Jan 17;23(1). doi: 10.1093/bib/bbab354.

A graph neural network model to estimate cell-wise metabolic flux using single-cell RNA-seq data.基于单细胞 RNA-seq 数据估计细胞代谢通量的图神经网络模型。

Genome Res. 2021 Oct;31(10):1867-1884. doi: 10.1101/gr.271205.120. Epub 2021 Jul 22.

PathCNN: interpretable convolutional neural networks for survival prediction and pathway analysis applied to glioblastoma.PathCNN：适用于胶质母细胞瘤的可解释卷积神经网络的生存预测和途径分析。

Bioinformatics. 2021 Jul 12;37(Suppl_1):i443-i450. doi: 10.1093/bioinformatics/btab285.

OmiEmbed: A Unified Multi-Task Deep Learning Framework for Multi-Omics Data.OmiEmbed：一个用于多组学数据的统一多任务深度学习框架。

Cancers (Basel). 2021 Jun 18;13(12):3047. doi: 10.3390/cancers13123047.

SMILE: systems metabolomics using interpretable learning and evolution.SMILE：基于可解释学习和进化的系统代谢组学。

BMC Bioinformatics. 2021 May 28;22(1):284. doi: 10.1186/s12859-021-04209-1.

Dried blood spot metabolomics reveals a metabolic fingerprint with diagnostic potential for Diamond Blackfan Anaemia.干血斑代谢组学揭示了 Diamond Blackfan 贫血具有诊断潜力的代谢特征。

Br J Haematol. 2021 Jun;193(6):1185-1193. doi: 10.1111/bjh.17524. Epub 2021 May 17.

Using machine learning approaches for multi-omics data analysis: A review.使用机器学习方法进行多组学数据分析：综述

Biotechnol Adv. 2021 Jul-Aug;49:107739. doi: 10.1016/j.biotechadv.2021.107739. Epub 2021 Mar 29.

Machine learning for precision medicine.机器学习与精准医学

Genome. 2021 Apr;64(4):416-425. doi: 10.1139/gen-2020-0131. Epub 2020 Oct 22.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

用于基于组学数据的系统生物学预测的可解释机器学习方法。

Interpretable machine learning methods for predictions in systems biology from omics data.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献