Bonizzoni Paola, Carrieri Anna Paola, Della Vedova Gianluca, Trucco Gabriella
BMC Genomics. 2014;15 Suppl 6(Suppl 6):S10. doi: 10.1186/1471-2164-15-S6-S10. Epub 2014 Oct 17.
The perfect phylogeny is an often used model in phylogenetics since it provides an efficient basic procedure for representing the evolution of genomic binary characters in several frameworks, such as for example in haplotype inference. The model, which is conceptually the simplest, is based on the infinite sites assumption, that is no character can mutate more than once in the whole tree. A main open problem regarding the model is finding generalizations that retain the computational tractability of the original model but are more flexible in modeling biological data when the infinite site assumption is violated because of e.g. back mutations. A special case of back mutations that has been considered in the study of the evolution of protein domains (where a domain is acquired and then lost) is persistency, that is the fact that a character is allowed to return back to the ancestral state. In this model characters can be gained and lost at most once. In this paper we consider the computational problem of explaining binary data by the Persistent Perfect Phylogeny model (referred as PPP) and for this purpose we investigate the problem of reconstructing an evolution where some constraints are imposed on the paths of the tree.
We define a natural generalization of the PPP problem obtained by requiring that for some pairs (character, species), neither the species nor any of its ancestors can have the character. In other words, some characters cannot be persistent for some species. This new problem is called Constrained PPP (CPPP). Based on a graph formulation of the CPPP problem, we are able to provide a polynomial time solution for the CPPP problem for matrices whose conflict graph has no edges. Using this result, we develop a parameterized algorithm for solving the CPPP problem where the parameter is the number of characters.
A preliminary experimental analysis shows that the constrained persistent perfect phylogeny model allows to explain efficiently data that do not conform with the classical perfect phylogeny model.
完美系统发育是系统发育学中常用的模型,因为它为在多个框架中表示基因组二元特征的进化提供了一个有效的基本过程,例如在单倍型推断中。该模型在概念上是最简单的,基于无限位点假设,即整个树中任何特征都不会发生多次突变。关于该模型的一个主要开放问题是找到一些推广,这些推广保留了原始模型的计算易处理性,但在由于例如反向突变而违反无限位点假设时,在对生物数据建模方面更灵活。在蛋白质结构域进化研究(其中一个结构域被获得然后丢失)中考虑的反向突变的一个特殊情况是持久性,即一个特征被允许回到祖先状态这一事实。在这个模型中,特征最多只能获得和丢失一次。在本文中,我们考虑用持久完美系统发育模型(简称为PPP)解释二元数据的计算问题,为此我们研究在树的路径上施加一些约束时的进化重建问题。
我们通过要求对于某些对(特征,物种),该物种及其任何祖先都不能具有该特征,定义了PPP问题的一种自然推广。换句话说,某些特征对于某些物种不能是持久的。这个新问题称为受限PPP(CPPP)。基于CPPP问题的图表示,我们能够为冲突图没有边的矩阵的CPPP问题提供多项式时间解。利用这个结果,我们开发了一种参数化算法来解决CPPP问题,其中参数是特征的数量。
初步的实验分析表明受限持久完美系统发育模型能够有效地解释不符合经典完美系统发育模型的数据。