Górecki Pawel, Eulenstein Oliver
BMC Bioinformatics. 2014;15 Suppl 13(Suppl 13):S3. doi: 10.1186/1471-2105-15-S13-S3. Epub 2014 Nov 13.
Evolutionary studies are complicated by discordance between gene trees and the species tree in which they evolved. Dealing with discordant trees often relies on comparison costs between gene and species trees, including the well-established Robinson-Foulds, gene duplication, and deep coalescence costs. While these costs have provided credible results for binary rooted gene trees, corresponding cost definitions for non-binary unrooted gene trees, which are frequently occurring in practice, are challenged by biological realism.
We propose a natural extension of the well-established costs for comparing unrooted and non-binary gene trees with rooted binary species trees using a binary refinement model. For the duplication cost we describe an efficient algorithm that is based on a linear time reduction and also computes an optimal rooted binary refinement of the given gene tree. Finally, we show that similar reductions lead to solutions for computing the deep coalescence and the Robinson-Foulds costs.
Our binary refinement of Robinson-Foulds, gene duplication, and deep coalescence costs for unrooted and non-binary gene trees together with the linear time reductions provided here for computing these costs significantly extends the range of trees that can be incorporated into approaches dealing with discordance.
基因树与其进化所在的物种树之间的不一致使进化研究变得复杂。处理不一致的树通常依赖于基因树和物种树之间的比较成本,包括已确立的罗宾逊 - 福尔兹成本、基因复制成本和深度合并成本。虽然这些成本为有根二叉基因树提供了可靠的结果,但对于实践中经常出现的无根非二叉基因树,相应的成本定义受到生物学现实的挑战。
我们提出了一种自然扩展,使用二叉细化模型将已确立的用于比较无根和非二叉基因树与有根二叉物种树的成本进行扩展。对于复制成本,我们描述了一种基于线性时间归约的高效算法,该算法还计算给定基因树的最优有根二叉细化。最后,我们表明类似的归约可得出计算深度合并成本和罗宾逊 - 福尔兹成本的解决方案。
我们对无根和非二叉基因树的罗宾逊 - 福尔兹成本、基因复制成本和深度合并成本进行的二叉细化,以及此处提供的用于计算这些成本的线性时间归约,显著扩展了可纳入处理不一致性方法的树的范围。