Suppr超能文献

基于深度学习的宏基因组和微生物组综合功能注释。

Comprehensive Functional Annotation of Metagenomes and Microbial Genomes Using a Deep Learning-Based Method.

机构信息

Malopolska Centre of Biotechnology, Jagiellonian University, Krakow, Poland.

Center for Computational Biology, Flatiron Institute, Simons Foundation, New York, New York, USA.

出版信息

mSystems. 2023 Apr 27;8(2):e0117822. doi: 10.1128/msystems.01178-22. Epub 2023 Mar 7.

Abstract

Comprehensive protein function annotation is essential for understanding microbiome-related disease mechanisms in the host organisms. However, a large portion of human gut microbial proteins lack functional annotation. Here, we have developed a new metagenome analysis workflow integrating genome reconstruction, taxonomic profiling, and deep learning-based functional annotations from DeepFRI. This is the first approach to apply deep learning-based functional annotations in metagenomics. We validate DeepFRI functional annotations by comparing them to orthology-based annotations from eggNOG on a set of 1,070 infant metagenomes from the DIABIMMUNE cohort. Using this workflow, we generated a sequence catalogue of 1.9 million nonredundant microbial genes. The functional annotations revealed 70% concordance between Gene Ontology annotations predicted by DeepFRI and eggNOG. DeepFRI improved the annotation coverage, with 99% of the gene catalogue obtaining Gene Ontology molecular function annotations, although they are less specific than those from eggNOG. Additionally, we constructed pangenomes in a reference-free manner using high-quality metagenome-assembled genomes (MAGs) and analyzed the associated annotations. eggNOG annotated more genes on well-studied organisms, such as Escherichia coli, while DeepFRI was less sensitive to taxa. Further, we show that DeepFRI provides additional annotations in comparison to the previous DIABIMMUNE studies. This workflow will contribute to novel understanding of the functional signature of the human gut microbiome in health and disease as well as guiding future metagenomics studies. The past decade has seen advancement in high-throughput sequencing technologies resulting in rapid accumulation of genomic data from microbial communities. While this growth in sequence data and gene discovery is impressive, the majority of microbial gene functions remain uncharacterized. The coverage of functional information coming from either experimental sources or inferences is low. To solve these challenges, we have developed a new workflow to computationally assemble microbial genomes and annotate the genes using a deep learning-based model DeepFRI. This improved microbial gene annotation coverage to 1.9 million metagenome-assembled genes, representing 99% of the assembled genes, which is a significant improvement compared to 12% Gene Ontology term annotation coverage by commonly used orthology-based approaches. Importantly, the workflow supports pangenome reconstruction in a reference-free manner, allowing us to analyze the functional potential of individual bacterial species. We therefore propose this alternative approach combining deep-learning functional predictions with the commonly used orthology-based annotations as one that could help us uncover novel functions observed in metagenomic microbiome studies.

摘要

全面的蛋白质功能注释对于理解宿主微生物组相关疾病机制至关重要。然而,大量的人类肠道微生物蛋白缺乏功能注释。在这里,我们开发了一种新的宏基因组分析工作流程,该流程整合了基因组重建、分类分析和基于深度学习的功能注释。这是首次在宏基因组学中应用基于深度学习的功能注释。我们通过将 DeepFRI 的基于功能注释与 eggNOG 基于同源性的注释进行比较,在 DIABIMMUNE 队列的 1070 个婴儿宏基因组数据集上验证了 DeepFRI 的功能注释。使用这个工作流程,我们生成了一个包含 190 万个非冗余微生物基因的序列目录。GO 注释预测的 DeepFRI 基因本体论注释与 eggNOG 的基因本体论注释之间存在 70%的一致性。DeepFRI 提高了注释的覆盖率,尽管它们不如 eggNOG 的那么具体,但 99%的基因目录获得了基因本体论分子功能注释。此外,我们还使用高质量的宏基因组组装基因组(MAG)构建了无参考的泛基因组,并分析了相关注释。在研究较多的生物体(如大肠杆菌)中,eggNOG 注释了更多的基因,而 DeepFRI 对分类群的敏感性较低。此外,我们还表明,与之前的 DIABIMMUNE 研究相比,DeepFRI 提供了更多的注释。这个工作流程将有助于对健康和疾病中人类肠道微生物组的功能特征有新的认识,并指导未来的宏基因组学研究。过去十年,高通量测序技术取得了进展,导致微生物群落的基因组数据迅速积累。虽然序列数据和基因发现的增长令人印象深刻,但大多数微生物基因功能仍未被描述。来自实验来源或推断的功能信息的覆盖率很低。为了解决这些挑战,我们开发了一种新的工作流程,使用基于深度学习的模型 DeepFRI 计算组装微生物基因组并注释基因。这将微生物基因注释覆盖率提高到了 190 万个宏基因组组装基因,占组装基因的 99%,与常用的基于同源性的方法 12%的基因本体论术语注释覆盖率相比有了显著提高。重要的是,该工作流程以无参考的方式支持泛基因组重建,使我们能够分析单个细菌物种的功能潜力。因此,我们提出了一种将深度学习功能预测与常用的基于同源性的注释相结合的替代方法,这可能有助于我们揭示宏基因组微生物组研究中观察到的新功能。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/66ac/10134832/caa2bfa11cb2/msystems.01178-22-f001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验