Dufayard Jean-François, Duret Laurent, Penel Simon, Gouy Manolo, Rechenmann François, Perrière Guy
INRIA Rhône-Alpes 38334 Montbonnot, Saint Ismier Cedex, France.
Bioinformatics. 2005 Jun 1;21(11):2596-603. doi: 10.1093/bioinformatics/bti325. Epub 2005 Feb 15.
Comparative sequence analysis is widely used to study genome function and evolution. This approach first requires the identification of homologous genes and then the interpretation of their homology relationships (orthology or paralogy). To provide help in this complex task, we developed three databases of homologous genes containing sequences, multiple alignments and phylogenetic trees: HOBACGEN, HOVERGEN and HOGENOM. In this paper, we present two new tools for automating the search for orthologs or paralogs in these databases.
First, we have developed and implemented an algorithm to infer speciation and duplication events by comparison of gene and species trees (tree reconciliation). Second, we have developed a general method to search in our databases the gene families for which the tree topology matches a peculiar tree pattern. This algorithm of unordered tree pattern matching has been implemented in the FamFetch graphical interface. With the help of a graphical editor, the user can specify the topology of the tree pattern, and set constraints on its nodes and leaves. Then, this pattern is compared with all the phylogenetic trees of the database, to retrieve the families in which one or several occurrences of this pattern are found. By specifying ad hoc patterns, it is therefore possible to identify orthologs in our databases.
比较序列分析被广泛用于研究基因组功能和进化。这种方法首先需要识别同源基因,然后解释它们的同源关系(直系同源或旁系同源)。为了在这项复杂任务中提供帮助,我们开发了三个包含序列、多重比对和系统发育树的同源基因数据库:HOBACGEN、HOVERGEN和HOGENOM。在本文中,我们展示了两种用于在这些数据库中自动搜索直系同源或旁系同源基因的新工具。
首先,我们开发并实现了一种通过比较基因树和物种树(树调和)来推断物种形成和复制事件的算法。其次,我们开发了一种通用方法,用于在我们的数据库中搜索其树拓扑结构与特定树模式匹配的基因家族。这种无序树模式匹配算法已在FamFetch图形界面中实现。借助图形编辑器,用户可以指定树模式的拓扑结构,并对其节点和叶设置约束。然后,将此模式与数据库中的所有系统发育树进行比较,以检索发现此模式一次或多次出现的家族。因此,通过指定特定模式,可以在我们的数据库中识别直系同源基因。