• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

MarkerML- 使用可解释机器学习进行宏基因组数据集中的标记特征识别。

MarkerML - Marker Feature Identification in Metagenomic Datasets Using Interpretable Machine Learning.

机构信息

TCS Research, Tata Consultancy Services Ltd, Pune 411 013, India; CSIR-Institute of Genomics and Integrative Biology (CSIR-IGIB), New Delhi 110 025, India; Academy of Scientific and Innovative Research (AcSIR), Ghaziabad 201 002, India. Electronic address: https://twitter.com/NagpalSun.

TCS Research, Tata Consultancy Services Ltd, Pune 411 013, India.

出版信息

J Mol Biol. 2022 Jun 15;434(11):167589. doi: 10.1016/j.jmb.2022.167589. Epub 2022 Apr 18.

DOI:10.1016/j.jmb.2022.167589
PMID:35662460
Abstract

Identification of environment specific marker-features is one of the key objectives of many metagenomic studies. It aims to identify such features in microbiome datasets that may serve as markers of the contrasting or comparable states. Hypothesis testing and black-box machine learnt models which are conventionally used for identification of these features are generally not exhaustive, especially because they generally do-not provide any quantifiable relevance (context) of/between the identified features. We present MarkerML web-server, that seeks to leverage the emergence of interpretable machine learning for facilitating the contextual discovery of metagenomic features of interest. It does so through a comprehensive and automated application of the concept of Shapley Additive Explanations in companionship to the compositionality accounted hypothesis testing for the multi-variate microbiome datasets. MarkerML not only helps in identification of marker-features, but also enables insights into the role and inter-dependence of the identified features in driving the decision making of the supervised machine learnt model. Generation of high quality and intuitive visualizations spanning prediction effect plots, model performance reports, feature dependency plots, Shapley and abundance informed cladograms (Sungrams), hypothesis tested violin plots along-with necessary provisions for excluding the participant bias and ensuring reproducibility of results, further seek to make the platform a useful asset for the scientists in the field of microbiome (and even beyond). The MarkerML web-server is freely available for the academic community at https://microbiome.igib.res.in/markerml/.

摘要

鉴定环境特异性标记特征是许多宏基因组研究的主要目标之一。它旨在识别微生物组数据集中可能作为对比或可比状态标志物的特征。传统上用于鉴定这些特征的假设检验和黑盒机器学习模型通常不全面,特别是因为它们通常不提供所鉴定特征之间的任何可量化的相关性(上下文)。我们提出了 MarkerML 网络服务器,旨在利用可解释机器学习的出现,促进对微生物组特征的上下文发现。它通过综合和自动应用 Shapley 加法解释的概念,并结合多变量微生物组数据的组成性假设检验来实现这一点。MarkerML 不仅有助于鉴定标记特征,还能深入了解所鉴定特征在驱动监督机器学习模型决策中的作用和相互依赖关系。生成高质量和直观的可视化效果,包括预测效果图、模型性能报告、特征依赖图、Shapley 和丰度信息的 cladograms(Sungrams)、经过假设检验的小提琴图,以及排除参与者偏差和确保结果可重复性的必要措施,进一步使该平台成为微生物组领域(甚至更广泛领域)科学家的有用资产。MarkerML 网络服务器可在 https://microbiome.igib.res.in/markerml/ 上免费供学术界使用。

相似文献

1
MarkerML - Marker Feature Identification in Metagenomic Datasets Using Interpretable Machine Learning.MarkerML- 使用可解释机器学习进行宏基因组数据集中的标记特征识别。
J Mol Biol. 2022 Jun 15;434(11):167589. doi: 10.1016/j.jmb.2022.167589. Epub 2022 Apr 18.
2
MegaR: an interactive R package for rapid sample classification and phenotype prediction using metagenome profiles and machine learning.MegaR:一个交互式 R 包,用于使用宏基因组谱和机器学习快速对样本进行分类和表型预测。
BMC Bioinformatics. 2021 Jan 18;22(1):25. doi: 10.1186/s12859-020-03933-4.
3
Gene-based microbiome representation enhances host phenotype classification.基于基因的微生物组表示增强了宿主表型分类。
mSystems. 2023 Aug 31;8(4):e0053123. doi: 10.1128/msystems.00531-23. Epub 2023 Jul 5.
4
Machine Learning Meta-analysis of Large Metagenomic Datasets: Tools and Biological Insights.大型宏基因组数据集的机器学习荟萃分析:工具与生物学见解
PLoS Comput Biol. 2016 Jul 11;12(7):e1004977. doi: 10.1371/journal.pcbi.1004977. eCollection 2016 Jul.
5
A machine learning framework to determine geolocations from metagenomic profiling.基于宏基因组分析的地理位置确定机器学习框架。
Biol Direct. 2020 Nov 23;15(1):27. doi: 10.1186/s13062-020-00278-z.
6
Interpretable and accurate prediction models for metagenomics data.可解释且准确的宏基因组学数据预测模型。
Gigascience. 2020 Mar 1;9(3). doi: 10.1093/gigascience/giaa010.
7
Massive metagenomic data analysis using abundance-based machine learning.基于丰度的机器学习在海量宏基因组数据分析中的应用。
Biol Direct. 2019 Aug 1;14(1):12. doi: 10.1186/s13062-019-0242-0.
8
A permutable MLP-like architecture for disease prediction from gut metagenomic data.一种可置换的类似于多层感知机的架构,用于从肠道宏基因组数据中进行疾病预测。
BMC Bioinformatics. 2024 Jul 24;25(1):246. doi: 10.1186/s12859-024-05856-w.
9
Systematic evaluation of supervised machine learning for sample origin prediction using metagenomic sequencing data.基于宏基因组测序数据的样本来源预测的有监督机器学习方法的系统评价。
Biol Direct. 2020 Dec 10;15(1):29. doi: 10.1186/s13062-020-00287-y.
10
Metagenomic evidence for a polymicrobial signature of sepsis.微生物组学证据表明脓毒症存在多种微生物特征。
Microb Genom. 2021 Sep;7(9). doi: 10.1099/mgen.0.000642.

引用本文的文献

1
MicroHDF: predicting host phenotypes with metagenomic data using a deep forest-based framework.MicroHDF:基于深度森林框架利用宏基因组数据预测宿主表型。
Brief Bioinform. 2024 Sep 23;25(6). doi: 10.1093/bib/bbae530.
2
EnsembleSeq: a workflow towards real-time, rapid, and simultaneous multi-kingdom-amplicon sequencing for holistic and resource-effective microbiome research at scale.EnsembleSeq:一种实时、快速、同时进行多菌群扩增子测序的工作流程,用于大规模进行整体且资源有效的微生物组研究。
Microbiol Spectr. 2024 Jun 4;12(6):e0415023. doi: 10.1128/spectrum.04150-23. Epub 2024 Apr 30.
3
Deep learning methods in metagenomics: a review.
元基因组学中的深度学习方法:综述。
Microb Genom. 2024 Apr;10(4). doi: 10.1099/mgen.0.001231.
4
A toolbox of machine learning software to support microbiome analysis.一个支持微生物组分析的机器学习软件工具箱。
Front Microbiol. 2023 Nov 22;14:1250806. doi: 10.3389/fmicb.2023.1250806. eCollection 2023.