TCS Research, Tata Consultancy Services Ltd, Pune 411 013, India; CSIR-Institute of Genomics and Integrative Biology (CSIR-IGIB), New Delhi 110 025, India; Academy of Scientific and Innovative Research (AcSIR), Ghaziabad 201 002, India. Electronic address: https://twitter.com/NagpalSun.
TCS Research, Tata Consultancy Services Ltd, Pune 411 013, India.
J Mol Biol. 2022 Jun 15;434(11):167589. doi: 10.1016/j.jmb.2022.167589. Epub 2022 Apr 18.
Identification of environment specific marker-features is one of the key objectives of many metagenomic studies. It aims to identify such features in microbiome datasets that may serve as markers of the contrasting or comparable states. Hypothesis testing and black-box machine learnt models which are conventionally used for identification of these features are generally not exhaustive, especially because they generally do-not provide any quantifiable relevance (context) of/between the identified features. We present MarkerML web-server, that seeks to leverage the emergence of interpretable machine learning for facilitating the contextual discovery of metagenomic features of interest. It does so through a comprehensive and automated application of the concept of Shapley Additive Explanations in companionship to the compositionality accounted hypothesis testing for the multi-variate microbiome datasets. MarkerML not only helps in identification of marker-features, but also enables insights into the role and inter-dependence of the identified features in driving the decision making of the supervised machine learnt model. Generation of high quality and intuitive visualizations spanning prediction effect plots, model performance reports, feature dependency plots, Shapley and abundance informed cladograms (Sungrams), hypothesis tested violin plots along-with necessary provisions for excluding the participant bias and ensuring reproducibility of results, further seek to make the platform a useful asset for the scientists in the field of microbiome (and even beyond). The MarkerML web-server is freely available for the academic community at https://microbiome.igib.res.in/markerml/.
鉴定环境特异性标记特征是许多宏基因组研究的主要目标之一。它旨在识别微生物组数据集中可能作为对比或可比状态标志物的特征。传统上用于鉴定这些特征的假设检验和黑盒机器学习模型通常不全面,特别是因为它们通常不提供所鉴定特征之间的任何可量化的相关性(上下文)。我们提出了 MarkerML 网络服务器,旨在利用可解释机器学习的出现,促进对微生物组特征的上下文发现。它通过综合和自动应用 Shapley 加法解释的概念,并结合多变量微生物组数据的组成性假设检验来实现这一点。MarkerML 不仅有助于鉴定标记特征,还能深入了解所鉴定特征在驱动监督机器学习模型决策中的作用和相互依赖关系。生成高质量和直观的可视化效果,包括预测效果图、模型性能报告、特征依赖图、Shapley 和丰度信息的 cladograms(Sungrams)、经过假设检验的小提琴图,以及排除参与者偏差和确保结果可重复性的必要措施,进一步使该平台成为微生物组领域(甚至更广泛领域)科学家的有用资产。MarkerML 网络服务器可在 https://microbiome.igib.res.in/markerml/ 上免费供学术界使用。