Suppr超能文献

DGCNN 方法将宏基因组衍生的分类群和功能信息联系起来,深入了解全球土壤有机碳。

DGCNN approach links metagenome-derived taxon and functional information providing insight into global soil organic carbon.

机构信息

IBM Research Europe, Sci-Tech Daresbury, The Hartree Centre, Warrington, UK.

STFC Daresbury Laboratory, The Hartree Centre, Warrington, UK.

出版信息

NPJ Biofilms Microbiomes. 2024 Oct 26;10(1):113. doi: 10.1038/s41522-024-00583-9.

Abstract

Metagenomics can provide insight into the microbial taxa present in a sample and, through gene identification, the functional potential of the community. However, taxonomic and functional information are typically considered separately in downstream analyses. We develop interpretable machine learning (ML) approaches for modelling metagenomic data, combining the biological representation of species with their associated genetically encoded functions within models. We apply our methods to investigate soil organic carbon (SOC) stocks. First, we combine a diverse global set of soil microbiome samples with environmental data, improving the predictive performance of classic ML and providing new insights into the role of soil microbiomes in global carbon cycling. Our network analysis of predictive taxa identified by classical ML models provides context for their ecological significance, extending the focus beyond just the most predictive taxa to 'hidden' features within the model that might be considered less predictive using standard methods for explainability. We next develop unique graph representations for individual microbiomes, linking microbial taxa to their associated functions directly, enabling predictions of SOC via deep graph convolutional neural networks (DGCNNs). Interpretation of the DGCNNs distinguished between the importance of functions of key individual species, providing genome sequence differences, e.g., gene loss/acquisition, that associate with SOC. These approaches identify several members of the Verrucomicrobiaceae family and a range of genetically encoded functions, e.g., related to carbohydrate metabolism, as important for SOC stocks and effective global SOC predictors. These relatively understudied but widespread organisms could play an important role in SOC dynamics globally.

摘要

宏基因组学可以深入了解样本中存在的微生物分类群,并通过基因鉴定了解群落的功能潜力。然而,在下游分析中,分类和功能信息通常是分开考虑的。我们开发了可解释的机器学习 (ML) 方法来对宏基因组数据进行建模,将物种的生物学表示与模型中与其相关的遗传编码功能结合起来。我们应用这些方法来研究土壤有机碳 (SOC) 储量。首先,我们将一组多样化的全球土壤微生物组样本与环境数据相结合,提高了经典 ML 的预测性能,并为土壤微生物组在全球碳循环中的作用提供了新的见解。我们通过经典 ML 模型对预测性分类群进行的网络分析,为其生态意义提供了背景,不仅关注最具预测性的分类群,还关注模型中可能使用标准可解释性方法认为预测性较低的“隐藏”特征。接下来,我们为每个微生物组开发了独特的图形表示,直接将微生物分类群与其相关功能联系起来,通过深度图卷积神经网络 (DGCNN) 实现 SOC 的预测。DGCNN 的解释区分了关键个别物种功能的重要性,提供了与 SOC 相关的基因序列差异,例如基因的丢失/获得,这些差异与 SOC 相关。这些方法确定了 Verrucomicrobiaceae 科的几个成员以及一系列遗传编码功能,例如与碳水化合物代谢相关的功能,这些对 SOC 储量和有效的全球 SOC 预测器很重要。这些相对研究较少但分布广泛的生物体可能在全球范围内的 SOC 动态中发挥重要作用。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/14ff/11513995/4f5bc81f46f2/41522_2024_583_Fig1_HTML.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验