Górecki Paweł, Tiuryn Jerzy
Institute of Informatics, Warsaw University Banacha 2, 02-678 Warsaw, Poland.
Bioinformatics. 2007 Jan 15;23(2):e116-22. doi: 10.1093/bioinformatics/btl296.
Inferring species phylogenies with a history of gene losses and duplications is a challenging and an important task in computational biology. This problem can be solved by duplication-loss models in which the primary step is to reconcile a rooted gene tree with a rooted species tree. Most modern methods of phylogenetic reconstruction (from sequences) produce unrooted gene trees. This limitation leads to the problem of transforming unrooted gene tree into a rooted tree, and then reconciling rooted trees. The main questions are 'What about biological interpretation of choosing rooting?', 'Can we find efficiently the optimal rootings?', 'Is the optimal rooting unique?'.
In this paper we present a model of reconciling unrooted gene tree with a rooted species tree, which is based on a concept of choosing rooting which has minimal reconciliation cost. Our analysis leads to the surprising property that all the minimal rootings have identical distributions of gene duplications and gene losses in the species tree. It implies, in our opinion, that the concept of an optimal rooting is very robust, and thus biologically meaningful. Also, it has nice computational properties. We present a linear time and space algorithm for computing optimal rooting(s). This algorithm was used in two different ways to reconstruct the optimal species phylogeny of five known yeast genomes from approximately 4700 gene trees. Moreover, we determined locations (history) of all gene duplications and gene losses in the final species tree. It is interesting to notice that the top five species trees are the same for both methods.
Software and documentation are freely available from http://bioputer.mimuw.edu.pl/~gorecki/urec
推断具有基因丢失和复制历史的物种系统发育是计算生物学中一项具有挑战性且重要的任务。这个问题可以通过复制 - 丢失模型来解决,其中主要步骤是将有根基因树与有根物种树进行协调。大多数现代系统发育重建方法(从序列)产生无根基因树。这种局限性导致了将无根基因树转换为有根树,然后协调有根树的问题。主要问题是“选择根的生物学解释是什么?”“我们能否有效地找到最优根?”“最优根是唯一的吗?”
在本文中,我们提出了一种将无根基因树与有根物种树进行协调的模型,该模型基于选择具有最小协调成本的根的概念。我们的分析得出了一个令人惊讶的特性,即所有最小根在物种树中具有相同的基因复制和基因丢失分布。我们认为,这意味着最优根的概念非常稳健,因此具有生物学意义。此外,它具有良好的计算特性。我们提出了一种线性时间和空间算法来计算最优根。该算法以两种不同方式用于从大约4700个基因树重建五个已知酵母基因组的最优物种系统发育。此外,我们确定了最终物种树中所有基因复制和基因丢失的位置(历史)。有趣的是,两种方法得到的前五个物种树是相同的。