Suppr超能文献

EcoFoldDB:宏基因组规模下生态相关微生物性状的蛋白质结构导向功能分析

EcoFoldDB: Protein Structure-Guided Functional Profiling of Ecologically Relevant Microbial Traits at the Metagenome Scale.

作者信息

Ghaly Timothy M, Rajabal Vaheesan, Russell Dylan, Colombi Elena, Tetu Sasha G

机构信息

School of Natural Sciences, Macquarie University, Sydney, Australia.

ARC Centre of Excellence in Synthetic Biology, Sydney, Australia.

出版信息

Environ Microbiol. 2025 Sep;27(9):e70178. doi: 10.1111/1462-2920.70178.

Abstract

Microbial communities are fundamental to planetary health and ecosystem processes. High-throughput metagenomic sequencing has provided unprecedented insights into the structure and function of these communities. However, functionally profiling metagenomes remains constrained due to the limited sensitivity of existing sequence homology-based methods to annotate evolutionarily divergent genes. Protein structure, more conserved than sequence and intrinsically tied to molecular function, offers a solution. Capitalising on recent breakthroughs in structural bioinformatics, we present EcoFoldDB, a database of protein structures curated for ecologically relevant microbial traits, and its companion pipeline, EcoFoldDB-annotate, which leverages Foldseek with the ProstT5 protein language model for rapid structural homology searching directly from sequence data. EcoFoldDB-annotate outperforms state-of-the-art sequence-based methods in annotating metagenomic proteins, in terms of sensitivity and precision. To demonstrate its utility and scalability, we performed structure-guided functional profiling of 32 million proteins encoded by 8000 high-quality metagenome-assembled genomes from the global soil microbiome. EcoFoldDB-annotate could resolve the phylogenetic partitioning of important nitrogen cycling pathways, from taxonomically restricted nitrifiers to more widespread denitrifiers, as well as identifying novel, uncultivated bacterial taxa enriched in plant growth-promoting traits. We anticipate that EcoFoldDB will enable researchers to extract ecological insights from environmental genomes and metagenomes and accelerate discoveries in microbial ecology.

摘要

微生物群落对于地球健康和生态系统过程至关重要。高通量宏基因组测序为这些群落的结构和功能提供了前所未有的见解。然而,由于现有基于序列同源性的方法在注释进化上不同的基因时灵敏度有限,对宏基因组进行功能分析仍然受到限制。蛋白质结构比序列更保守,并且与分子功能内在相关,提供了一种解决方案。利用结构生物信息学的最新突破,我们展示了EcoFoldDB,一个为生态相关微生物特征精心策划的蛋白质结构数据库,以及它的配套流程EcoFoldDB-annotate,该流程利用Foldseek和ProstT5蛋白质语言模型直接从序列数据中进行快速结构同源性搜索。在注释宏基因组蛋白质方面,EcoFoldDB-annotate在灵敏度和精度方面优于基于序列的现有方法。为了证明其效用和可扩展性,我们对来自全球土壤微生物组的8000个高质量宏基因组组装基因组编码的3200万个蛋白质进行了结构导向的功能分析。EcoFoldDB-annotate可以解析重要氮循环途径的系统发育划分,从分类学上受限的硝化菌到更广泛的反硝化菌,还能识别富含促进植物生长特征的新型未培养细菌类群。我们预计EcoFoldDB将使研究人员能够从环境基因组和宏基因组中提取生态见解,并加速微生物生态学的发现。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验