Paten Benedict, Zerbino Daniel R, Hickey Glenn, Haussler David
University of California, Santa Cruz, 1156 High St, 95064 Santa Cruz, USA.
BMC Bioinformatics. 2014 Jun 19;15:206. doi: 10.1186/1471-2105-15-206.
Parsimony and maximum likelihood methods of phylogenetic tree estimation and parsimony methods for genome rearrangements are central to the study of genome evolution yet to date they have largely been pursued in isolation.
We present a data structure called a history graph that offers a practical basis for the analysis of genome evolution. It conceptually simplifies the study of parsimonious evolutionary histories by representing both substitutions and double cut and join (DCJ) rearrangements in the presence of duplications. The problem of constructing parsimonious history graphs thus subsumes related maximum parsimony problems in the fields of phylogenetic reconstruction and genome rearrangement. We show that tractable functions can be used to define upper and lower bounds on the minimum number of substitutions and DCJ rearrangements needed to explain any history graph. These bounds become tight for a special type of unambiguous history graph called an ancestral variation graph (AVG), which constrains in its combinatorial structure the number of operations required. We finally demonstrate that for a given history graph G, a finite set of AVGs describe all parsimonious interpretations of G, and this set can be explored with a few sampling moves.
This theoretical study describes a model in which the inference of genome rearrangements and phylogeny can be unified under parsimony.
系统发育树估计的简约法和最大似然法以及基因组重排的简约法是基因组进化研究的核心,但迄今为止它们在很大程度上是孤立进行研究的。
我们提出了一种称为历史图的数据结构,它为基因组进化分析提供了一个实用基础。通过在存在重复的情况下表示替换和双切割与连接(DCJ)重排,它在概念上简化了对简约进化历史的研究。因此,构建简约历史图的问题包含了系统发育重建和基因组重排领域中相关的最大简约问题。我们表明,可以使用可处理的函数来定义解释任何历史图所需的最小替换数和DCJ重排数的上下界。对于一种特殊类型的明确历史图,即祖先变异图(AVG),这些界变得紧密,它在其组合结构中限制了所需的操作数。我们最终证明,对于给定的历史图G,一组有限的AVG描述了G的所有简约解释,并且可以通过一些采样移动来探索这组解释。
这项理论研究描述了一个模型,在该模型中基因组重排和系统发育的推断可以在简约原则下统一起来。