Department of Civil and Environmental Engineering, University of Washington, Seattle, Washington, USA.
Phase Genomics, Seattle, Washington, USA.
mSystems. 2024 Aug 20;9(8):e0057324. doi: 10.1128/msystems.00573-24. Epub 2024 Jul 9.
Metagenomic sequencing has advanced our understanding of biogeochemical processes by providing an unprecedented view into the microbial composition of different ecosystems. While the amount of metagenomic data has grown rapidly, simple-to-use methods to analyze and compare across studies have lagged behind. Thus, tools expressing the metabolic traits of a community are needed to broaden the utility of existing data. Gene abundance profiles are a relatively low-dimensional embedding of a metagenome's functional potential and are, thus, tractable for comparison across many samples. Here, we compare the abundance of KEGG Ortholog Groups (KOs) from 6,539 metagenomes from the Joint Genome Institute's Integrated Microbial Genomes and Metagenomes (JGI IMG/M) database. We find that samples cluster into terrestrial, aquatic, and anaerobic ecosystems with marker KOs reflecting adaptations to these environments. For instance, functional clusters were differentiated by the metabolism of antibiotics, photosynthesis, methanogenesis, and surprisingly GC content. Using this functional gene approach, we reveal the broad-scale patterns shaping microbial communities and demonstrate the utility of ortholog abundance profiles for representing a rapidly expanding body of metagenomic data.
Metagenomics, or the sequencing of DNA from complex microbiomes, provides a view into the microbial composition of different environments. Metagenome databases were created to compile sequencing data across studies, but it remains challenging to compare and gain insight from these large data sets. Consequently, there is a need to develop accessible approaches to extract knowledge across metagenomes. The abundance of different orthologs (i.e., genes that perform a similar function across species) provides a simplified representation of a metagenome's metabolic potential that can easily be compared with others. In this study, we cluster the ortholog abundance profiles of thousands of metagenomes from diverse environments and uncover the traits that distinguish them. This work provides a simple to use framework for functional comparison and advances our understanding of how the environment shapes microbial communities.
宏基因组测序通过提供对不同生态系统中微生物组成的前所未有的了解,推动了我们对生物地球化学过程的理解。虽然宏基因组数据量增长迅速,但简单易用的分析和跨研究比较的方法却滞后了。因此,需要能够表达群落代谢特征的工具来拓宽现有数据的用途。基因丰度谱是对微生物组功能潜力的相对低维嵌入,因此可用于跨多个样本进行比较。在这里,我们比较了来自联合基因组研究所综合微生物基因组和宏基因组(JGI IMG/M)数据库的 6539 个宏基因组的 KEGG 直系同源物组(KO)的丰度。我们发现,样本聚类为陆地、水生和厌氧生态系统,标记 KO 反映了对这些环境的适应。例如,功能群通过抗生素代谢、光合作用、产甲烷作用以及令人惊讶的 GC 含量来区分。使用这种功能基因方法,我们揭示了塑造微生物群落的广泛模式,并展示了同源丰度谱代表快速扩展的宏基因组数据的实用性。
宏基因组学,或对复杂微生物组的 DNA 进行测序,提供了对不同环境中微生物组成的了解。宏基因组数据库的创建是为了编译跨研究的测序数据,但仍然难以比较和从这些大型数据集获得洞察力。因此,需要开发可访问的方法来从宏基因组中提取知识。不同直系同源物(即,在物种之间执行相似功能的基因)的丰度提供了微生物组代谢潜力的简化表示,可轻松与其他表示进行比较。在这项研究中,我们对来自不同环境的数千个宏基因组的直系同源丰度谱进行聚类,并揭示了区分它们的特征。这项工作为功能比较提供了一个简单易用的框架,并推进了我们对环境如何塑造微生物群落的理解。