Schloss Patrick D, Handelsman Jo
Department of Microbiology, University of Massachusetts - Amherst, Amherst, MA 01003, USA.
BMC Bioinformatics. 2008 Jan 23;9:34. doi: 10.1186/1471-2105-9-34.
The 99% of bacteria in the environment that are recalcitrant to culturing have spurred the development of metagenomics, a culture-independent approach to sample and characterize microbial genomes. Massive datasets of metagenomic sequences have been accumulated, but analysis of these sequences has focused primarily on the descriptive comparison of the relative abundance of proteins that belong to specific functional categories. More robust statistical methods are needed to make inferences from metagenomic data. In this study, we developed and applied a suite of tools to describe and compare the richness, membership, and structure of microbial communities using peptide fragment sequences extracted from metagenomic sequence data.
Application of these tools to acid mine drainage, soil, and whale fall metagenomic sequence collections revealed groups of peptide fragments with a relatively high abundance and no known function. When combined with analysis of 16S rRNA gene fragments from the same communities these tools enabled us to demonstrate that although there was no overlap in the types of 16S rRNA gene sequence observed, there was a core collection of operational protein families that was shared among the three environments.
The results of comparisons between the three habitats were surprising considering the relatively low overlap of membership and the distinctively different characteristics of the three habitats. These tools will facilitate the use of metagenomics to pursue statistically sound genome-based ecological analyses.
环境中99%难以培养的细菌推动了宏基因组学的发展,这是一种不依赖培养的方法,用于对微生物基因组进行采样和表征。大量的宏基因组序列数据集已经积累起来,但对这些序列的分析主要集中在属于特定功能类别的蛋白质相对丰度的描述性比较上。需要更强大的统计方法来从宏基因组数据中进行推断。在本研究中,我们开发并应用了一套工具,使用从宏基因组序列数据中提取的肽片段序列来描述和比较微生物群落的丰富度、成员组成和结构。
将这些工具应用于酸性矿山排水、土壤和鲸落宏基因组序列集合,发现了一组丰度相对较高且功能未知的肽片段。当与来自相同群落的16S rRNA基因片段分析相结合时,这些工具使我们能够证明,尽管观察到的16S rRNA基因序列类型没有重叠,但在这三种环境中存在一组共享的核心操作蛋白家族。
考虑到成员组成的重叠相对较低以及这三种栖息地明显不同的特征,这三种栖息地之间的比较结果令人惊讶。这些工具将有助于利用宏基因组学进行基于统计学的可靠的基因组生态分析。