Department of Biology, Syracuse University, Syracuse, New York, United States of America.
PLoS One. 2007 Aug 15;2(8):e743. doi: 10.1371/journal.pone.0000743.
Automated DNA sequencing technology is so rapid that analysis has become the rate-limiting step. Hundreds of prokaryotic genome sequences are publicly available, with new genomes uploaded at the rate of approximately 20 per month. As a result, this growing body of genome sequences will include microorganisms not previously identified, isolated, or observed. We hypothesize that evolutionary pressure exerted by an ecological niche selects for a similar genetic repertoire in those prokaryotes that occupy the same niche, and that this is due to both vertical and horizontal transmission. To test this, we have developed a novel method to classify prokaryotes, by calculating their Pfam protein domain distributions and clustering them with all other sequenced prokaryotic species. Clusters of organisms are visualized in two dimensions as 'mountains' on a topological map. When compared to a phylogenetic map constructed using 16S rRNA, this map more accurately clusters prokaryotes according to functional and environmental attributes. We demonstrate the ability of this map, which we term a "niche map", to cluster according to ecological niche both quantitatively and qualitatively, and propose that this method be used to associate uncharacterized prokaryotes with their ecological niche as a means of predicting their functional role directly from their genome sequence.
自动化 DNA 测序技术非常快速,以至于分析成为了限速步骤。数以百计的原核生物基因组序列已经公开,每月大约有 20 个新的基因组被上传。因此,这个不断增长的基因组序列将包括以前没有被识别、分离或观察到的微生物。我们假设,生态位施加的进化压力会选择在占据相同生态位的原核生物中产生相似的遗传库,这是由于垂直和水平传播的共同作用。为了验证这一点,我们开发了一种新的方法来对原核生物进行分类,通过计算它们 Pfam 蛋白结构域的分布,并将它们与所有其他已测序的原核生物物种聚类。生物体的聚类在拓扑图上以二维形式显示为“山脉”。与使用 16S rRNA 构建的系统发育图相比,该图更准确地根据功能和环境属性对原核生物进行聚类。我们展示了这种图谱(我们称之为“生态位图谱”)根据生态位进行聚类的能力,无论是在定量还是定性方面,我们还提出了这种方法,以便将未被描述的原核生物与其生态位相关联,从而可以直接从它们的基因组序列预测它们的功能角色。