National Center for Biotechnology Information, NLM, National Institutes of Health, Bethesda, MD 20894, USA.
Biol Direct. 2012 Dec 14;7:46. doi: 10.1186/1745-6150-7-46.
Collections of Clusters of Orthologous Genes (COGs) provide indispensable tools for comparative genomic analysis, evolutionary reconstruction and functional annotation of new genomes. Initially, COGs were made for all complete genomes of cellular life forms that were available at the time. However, with the accumulation of thousands of complete genomes, construction of a comprehensive COG set has become extremely computationally demanding and prone to error propagation, necessitating the switch to taxon-specific COG collections. Previously, we reported the collection of COGs for 41 genomes of Archaea (arCOGs). Here we present a major update of the arCOGs and describe evolutionary reconstructions to reveal general trends in the evolution of Archaea.
The updated version of the arCOG database incorporates 91% of the pangenome of 120 archaea (251,032 protein-coding genes altogether) into 10,335 arCOGs. Using this new set of arCOGs, we performed maximum likelihood reconstruction of the genome content of archaeal ancestral forms and gene gain and loss events in archaeal evolution. This reconstruction shows that the last Common Ancestor of the extant Archaea was an organism of greater complexity than most of the extant archaea, probably with over 2,500 protein-coding genes. The subsequent evolution of almost all archaeal lineages was apparently dominated by gene loss resulting in genome streamlining. Overall, in the evolution of Archaea as well as a representative set of bacteria that was similarly analyzed for comparison, gene losses are estimated to outnumber gene gains at least 4 to 1. Analysis of specific patterns of gene gain in Archaea shows that, although some groups, in particular Halobacteria, acquire substantially more genes than others, on the whole, gene exchange between major groups of Archaea appears to be largely random, with no major 'highways' of horizontal gene transfer.
The updated collection of arCOGs is expected to become a key resource for comparative genomics, evolutionary reconstruction and functional annotation of new archaeal genomes. Given that, in spite of the major increase in the number of genomes, the conserved core of archaeal genes appears to be stabilizing, the major evolutionary trends revealed here have a chance to stand the test of time.
This article was reviewed by (for complete reviews see the Reviewers' Reports section): Dr. PLG, Prof. PF, Dr. PL (nominated by Prof. JPG).
同源基因簇(COG)的集合为比较基因组分析、进化重建和新基因组的功能注释提供了不可或缺的工具。最初,COG 是为当时所有可用的细胞生命形式的完整基因组制作的。然而,随着数千个完整基因组的积累,构建一个全面的 COG 集合变得极其计算密集且容易传播错误,因此需要切换到分类群特异性 COG 集合。此前,我们报道了 41 个古菌基因组(arCOG)的 COG 集合。在这里,我们对 arCOG 进行了重大更新,并描述了进化重建,以揭示古菌进化的一般趋势。
arCOG 数据库的更新版本将 120 种古菌(总共 251032 个蛋白质编码基因)的泛基因组的 91%整合到 10335 个 arCOG 中。使用这个新的 arCOG 集合,我们对古菌祖先形式的基因组内容和古菌进化中的基因增益和丢失事件进行了最大似然重建。该重建表明,现存古菌的最后共同祖先比大多数现存古菌的复杂程度更高,可能有超过 2500 个蛋白质编码基因。随后,几乎所有古菌谱系的进化显然都受到基因丢失的主导,导致基因组简化。总体而言,在古菌的进化以及为进行比较而同样进行分析的一组代表细菌中,基因丢失的数量估计至少是基因获得数量的 4 倍。对古菌中基因获得的特定模式的分析表明,尽管某些群体,特别是盐杆菌,获得的基因明显多于其他群体,但总的来说,古菌主要群体之间的基因交换似乎是随机的,没有主要的水平基因转移“高速公路”。
更新的 arCOG 集合有望成为比较基因组学、进化重建和新古菌基因组功能注释的关键资源。鉴于尽管基因组数量大幅增加,但古菌基因的保守核心似乎趋于稳定,这里揭示的主要进化趋势有可能经得起时间的考验。
本文由(完整评论见评论者报告部分):PLG 博士、PF 教授、PL 博士(由 JPG 教授提名)进行了评审。