Donati Beatrice, Baudet Christian, Sinaimeri Blerina, Crescenzi Pierluigi, Sagot Marie-France
Inria Grenoble - Rhône-Alpes; Inovallée 655, avenue de l'Europe, Montbonnot, Saint Ismier cedex, 38334 France.
Université de Lyon, F-69000, Lyon; Université Lyon 1; CNRS, UMR5558; 43 Boulevard du 11 Novembre 1918, Villeurbanne cedex, 69622 France.
Algorithms Mol Biol. 2015 Jan 23;10(1):3. doi: 10.1186/s13015-014-0031-3. eCollection 2015.
Phylogenetic tree reconciliation is the approach of choice for investigating the coevolution of sets of organisms such as hosts and parasites. It consists in a mapping between the parasite tree and the host tree using event-based maximum parsimony. Given a cost model for the events, many optimal reconciliations are however possible. Any further biological interpretation of them must therefore take this into account, making the capacity to enumerate all optimal solutions a crucial point. Only two algorithms currently exist that attempt such enumeration; in one case not all possible solutions are produced while in the other not all cost vectors are currently handled. The objective of this paper is two-fold. The first is to fill this gap, and the second is to test whether the number of solutions generally observed can be an issue in terms of interpretation.
We present a polynomial-delay algorithm for enumerating all optimal reconciliations. We show that in general many solutions exist. We give an example where, for two pairs of host-parasite trees having each less than 41 leaves, the number of solutions is 5120, even when only time-feasible ones are kept. To facilitate their interpretation, those solutions are also classified in terms of how many of each event they contain. The number of different classes of solutions may thus be notably smaller than the number of solutions, yet they may remain high enough, in particular for the cases where losses have cost 0. In fact, depending on the cost vector, both numbers of solutions and of classes thereof may increase considerably. To further deal with this problem, we introduce and analyse a restricted version where host switches are allowed to happen only between species that are within some fixed distance along the host tree. This restriction allows us to reduce the number of time-feasible solutions while preserving the same optimal cost, as well as to find time-feasible solutions with a cost close to the optimal in the cases where no time-feasible solution is found.
We present Eucalypt, a polynomial-delay algorithm for enumerating all optimal reconciliations which is freely available at http://eucalypt.gforge.inria.fr/.
系统发育树比对是研究宿主与寄生虫等生物群体共同进化的首选方法。它通过基于事件的最大简约法在寄生虫树和宿主树之间进行映射。然而,给定事件的成本模型,可能存在许多最优比对。因此,对它们进行任何进一步的生物学解释都必须考虑到这一点,这使得枚举所有最优解的能力成为一个关键点。目前只有两种算法尝试进行这种枚举;一种情况是没有产生所有可能的解,而另一种情况是目前没有处理所有的成本向量。本文的目标有两个。第一个是填补这一空白,第二个是测试通常观察到的解的数量在解释方面是否会成为一个问题。
我们提出了一种多项式延迟算法来枚举所有最优比对。我们表明一般存在许多解。我们给出一个例子,对于两对叶数均少于41的宿主 - 寄生虫树,即使只保留时间可行的解,解的数量也达到5120个。为便于解释,这些解还根据它们包含的每种事件的数量进行分类。不同类别的解的数量可能明显少于解的数量,但它们可能仍然足够多,特别是对于损失成本为0的情况。事实上,根据成本向量的不同,解的数量及其类别数量都可能大幅增加。为了进一步处理这个问题,我们引入并分析了一个受限版本,其中宿主转换只允许在宿主树上沿固定距离内的物种之间发生。这种限制使我们能够在保持相同最优成本的同时减少时间可行解的数量,并且在找不到时间可行解的情况下找到成本接近最优的时间可行解。
我们提出了Eucalypt,一种用于枚举所有最优比对的多项式延迟算法,可在http://eucalypt.gforge.inria.fr/免费获取。