Merkeev Igor V, Novichkov Pavel S, Mironov Andrey A
State Scientific Center GosNIIGenetica, 1st Dorozhny pr., 1, Moscow, 113545, Russia.
BMC Evol Biol. 2006 Jun 22;6:52. doi: 10.1186/1471-2148-6-52.
Orthologs and paralogs are widely used terms in modern comparative genomics. Existing procedures for resolving orthologous/paralogous relationships are often based on manual revision of clusters of orthologous groups and/or lack any rigorous evolutionary base.
We developed a completely automated procedure that creates clusters of orthologous groups at each node of the taxonomy tree (PHOGs--Phylogenetic Orthologous Groups). As a result of this procedure, a tree of orthologous groups was obtained. Each cluster is a "supergene" and it is represented by an "ancestral" sequence obtained from the multiple alignment of orthologous and paralogous genes. The procedure has been applied to the taxonomy tree of organisms from all three domains of life. Protein complements from 50 bacterial, archaeal and eukaryotic species were used to create PHOGs at all tree nodes. 51367 PHOGs were obtained at the root node.
The PHOG database demonstrates that it is possible to automatically process any number of sequenced genomes and to reconstruct orthologous and paralogous relationships between genomes using a rigorous evolutionary approach. This database can become a very useful tool in various areas of comparative genomics.
直系同源基因和旁系同源基因是现代比较基因组学中广泛使用的术语。现有的解析直系同源/旁系同源关系的方法通常基于对直系同源基因簇的人工修订,和/或缺乏任何严格的进化基础。
我们开发了一种完全自动化的程序,该程序在分类树的每个节点上创建直系同源基因簇(PHOGs——系统发育直系同源基因簇)。通过这个程序,得到了一个直系同源基因簇树。每个簇都是一个“超级基因”,它由从直系同源基因和旁系同源基因的多重比对中获得的“祖先”序列表示。该程序已应用于生命三个域中所有生物的分类树。使用来自50种细菌、古菌和真核生物的蛋白质组在所有树节点上创建PHOGs。在根节点获得了51367个PHOGs。
PHOG数据库表明,使用严格的进化方法自动处理任意数量的已测序基因组并重建基因组之间的直系同源和旁系同源关系是可能的。该数据库可以成为比较基因组学各个领域非常有用的工具。