Computer Science and Mathematics Division, Oak Ridge National Laboratory, Oak Ridge, TN, USA.
BMC Evol Biol. 2014 Oct 9;14:207. doi: 10.1186/s12862-014-0207-y.
Phylogenetic studies have provided detailed knowledge on the evolutionary mechanisms of genes and species in Bacteria and Archaea. However, the evolution of cellular functions, represented by metabolic pathways and biological processes, has not been systematically characterized. Many clades in the prokaryotic tree of life have now been covered by sequenced genomes in GenBank. This enables a large-scale functional phylogenomics study of many computationally inferred cellular functions across all sequenced prokaryotes.
A total of 14,727 GenBank prokaryotic genomes were re-annotated using a new protein family database, UniFam, to obtain consistent functional annotations for accurate comparison. The functional profile of a genome was represented by the biological process Gene Ontology (GO) terms in its annotation. The GO term enrichment analysis differentiated the functional profiles between selected archaeal taxa. 706 prokaryotic metabolic pathways were inferred from these genomes using Pathway Tools and MetaCyc. The consistency between the distribution of metabolic pathways in the genomes and the phylogenetic tree of the genomes was measured using parsimony scores and retention indices. The ancestral functional profiles at the internal nodes of the phylogenetic tree were reconstructed to track the gains and losses of metabolic pathways in evolutionary history.
Our functional phylogenomics analysis shows divergent functional profiles of taxa and clades. Such function-phylogeny correlation stems from a set of clade-specific cellular functions with low parsimony scores. On the other hand, many cellular functions are sparsely dispersed across many clades with high parsimony scores. These different types of cellular functions have distinct evolutionary patterns reconstructed from the prokaryotic tree.
系统发育研究为细菌和古菌的基因和物种的进化机制提供了详细的知识。然而,以代谢途径和生物过程为代表的细胞功能的进化尚未得到系统的描述。生命之树的许多原核生物进化枝现在都已被 GenBank 中的测序基因组所覆盖。这使得对所有已测序原核生物的许多计算推断的细胞功能进行大规模的功能系统发育组学研究成为可能。
使用新的蛋白质家族数据库 UniFam 对总共 14727 个 GenBank 原核生物基因组进行了重新注释,以获得一致的功能注释,从而进行准确的比较。基因组的功能特征由其注释中的生物过程基因本体论 (GO) 术语表示。GO 术语富集分析区分了选定的古菌类群的功能特征。使用 Pathway Tools 和 MetaCyc 从这些基因组中推断出 706 个原核生物代谢途径。使用简约得分和保留指数来衡量代谢途径在基因组中的分布与基因组的系统发育树之间的一致性。重建系统发育树内部节点的祖先功能特征,以追踪代谢途径在进化历史中的得失。
我们的功能系统发育组学分析显示了分类群和进化枝的不同功能特征。这种功能-系统发育相关性源于一组具有低简约得分的特定进化枝的细胞功能。另一方面,许多细胞功能在具有高简约得分的许多进化枝中稀疏分布。这些不同类型的细胞功能从原核生物树中重建出具有不同的进化模式。