Lafond Manuel, Ouangraoua Aïda, El-Mabrouk Nadia
BMC Bioinformatics. 2015;16 Suppl 14(Suppl 14):S4. doi: 10.1186/1471-2105-16-S14-S4. Epub 2015 Oct 2.
Combining a set of trees on partial datasets into a single tree is a classical method for inferring large phylogenetic trees. Ideally, the combined tree should display each input partial tree, which is only possible if input trees do not contain contradictory phylogenetic information. The simplest version of the supertree problem is thus to state whether a set of trees is compatible, and if so, construct a tree displaying them all. Classically, supertree methods have been applied to the reconstruction of species trees. Here we rather consider reconstructing a super gene tree in light of a known species tree S. We define the supergenetree problem as finding, among all supertrees displaying a set of input gene trees, one supertree minimizing a reconciliation distance with S. We first show how classical exact methods to the supertree problem can be extended to the supergenetree problem. As all these methods are highly exponential, we also exhibit a natural greedy heuristic for the duplication cost, based on minimizing the set of duplications preceding the first speciation event. We then show that both the supergenetree problem and its restriction to minimizing duplications preceding the first speciation are NP-hard to approximate within a n1-ϵ factor, for any 0 < ϵ < 1. Finally, we show that a restriction of this problem to uniquely labeled speciation gene trees, which is relevant to many biological applications, is also NP-hard. Therefore, we introduce new avenues in the field of supertrees, and set the theoretical basis for the exploration of various algorithmic aspects of the problems.
将部分数据集上的一组树合并为一棵单一的树是推断大型系统发育树的经典方法。理想情况下,合并后的树应展示每棵输入的部分树,只有当输入树不包含相互矛盾的系统发育信息时才有可能。因此,超树问题的最简单版本是说明一组树是否兼容,如果兼容,则构建一棵展示所有这些树的树。传统上,超树方法已应用于物种树的重建。在这里,我们考虑根据已知的物种树S重建一棵超基因树。我们将超基因树问题定义为在展示一组输入基因树的所有超树中,找到一棵与S的和解距离最小的超树。我们首先展示了如何将超树问题的经典精确方法扩展到超基因树问题。由于所有这些方法都是高度指数级的,我们还基于最小化第一次物种形成事件之前的重复集,展示了一种针对重复成本的自然贪婪启发式方法。然后我们表明,对于任何0 < ϵ < 1,超基因树问题及其对最小化第一次物种形成之前的重复的限制在n1-ϵ因子内难以近似求解。最后,我们表明将这个问题限制在唯一标记的物种形成基因树上,这与许多生物学应用相关,也是NP难的。因此,我们在超树领域引入了新的途径,并为探索该问题的各种算法方面奠定了理论基础。