Department of Computer Science, Iowa State University, Ames, IA 50011, USA.
BMC Bioinformatics. 2012 Jun 25;13 Suppl 10(Suppl 10):S11. doi: 10.1186/1471-2105-13-S10-S11.
Gene tree - species tree reconciliation problems infer the patterns and processes of gene evolution within a species tree. Gene tree parsimony approaches seek the evolutionary scenario that implies the fewest gene duplications, duplications and losses, or deep coalescence (incomplete lineage sorting) events needed to reconcile a gene tree and a species tree. While a gene tree parsimony approach can be informative about genome evolution and phylogenetics, error in gene trees can profoundly bias the results.
We introduce efficient algorithms that rapidly search local Subtree Prune and Regraft (SPR) or Tree Bisection and Reconnection (TBR) neighborhoods of a given gene tree to identify a topology that implies the fewest duplications, duplication and losses, or deep coalescence events. These algorithms improve on the current solutions by a factor of n for searching SPR neighborhoods and n2 for searching TBR neighborhoods, where n is the number of taxa in the given gene tree. They provide a fast error correction protocol for ameliorating the effects of gene tree error by allowing small rearrangements in the topology to improve the reconciliation cost. We also demonstrate a simple protocol to use the gene rearrangement algorithm to improve gene tree parsimony phylogenetic analyses.
The new gene tree rearrangement algorithms provide a fast method to address gene tree error. They do not make assumptions about the underlying processes of genome evolution, and they are amenable to analyses of large-scale genomic data sets. These algorithms are also easily incorporated into gene tree parsimony phylogenetic analyses, potentially producing more credible estimates of reconciliation cost.
基因树-物种树协调问题推断了物种树内基因进化的模式和过程。基因树简约方法寻求能够协调基因树和物种树所需的最少基因复制、复制和丢失或深合并(不完全谱系分选)事件的进化场景。虽然基因树简约方法可以提供有关基因组进化和系统发育学的信息,但基因树中的错误会严重影响结果。
我们引入了有效的算法,可以快速搜索给定基因树的局部子树修剪和重接(SPR)或树二分和重连接(TBR)邻域,以确定一种拓扑结构,该拓扑结构暗示了最少的复制、复制和丢失或深合并事件。这些算法通过搜索 SPR 邻域的 n 倍和搜索 TBR 邻域的 n2 倍来改进当前的解决方案,其中 n 是给定基因树中分类单元的数量。它们通过允许拓扑结构中的小重排来改善基因树错误的影响,从而提供了一种快速的错误纠正协议。我们还展示了一种简单的协议,使用基因重排算法来改进基因树简约系统发育分析。
新的基因树重排算法提供了一种快速解决基因树错误的方法。它们不假设基因组进化的潜在过程,并且适用于大规模基因组数据集的分析。这些算法也很容易整合到基因树简约系统发育分析中,可能会产生更可信的协调成本估计。