具有受限重组的系统发育网络的最优、高效重建。

Optimal, efficient reconstruction of phylogenetic networks with constrained recombination.

作者信息

Gusfield Dan, Eddhu Satish, Langley Charles

机构信息

Computer Science, University of California, Davis, Kemper Hall, Davis, California, 95616, USA.

出版信息

J Bioinform Comput Biol. 2004 Mar;2(1):173-213. doi: 10.1142/s0219720004000521.

DOI:10.1142/s0219720004000521

PMID:15272438

Abstract

A phylogenetic network is a generalization of a phylogenetic tree, allowing structural properties that are not tree-like. In a seminal paper, Wang et al.(1) studied the problem of constructing a phylogenetic network, allowing recombination between sequences, with the constraint that the resulting cycles must be disjoint. We call such a phylogenetic network a "galled-tree". They gave a polynomial-time algorithm that was intended to determine whether or not a set of sequences could be generated on galled-tree. Unfortunately, the algorithm by Wang et al.(1) is incomplete and does not constitute a necessary test for the existence of a galled-tree for the data. In this paper, we completely solve the problem. Moreover, we prove that if there is a galled-tree, then the one produced by our algorithm minimizes the number of recombinations over all phylogenetic networks for the data, even allowing multiple-crossover recombinations. We also prove that when there is a galled-tree for the data, the galled-tree minimizing the number of recombinations is "essentially unique". We also note two additional results: first, any set of sequences that can be derived on a galled tree can be derived on a true tree (without recombination cycles), where at most one back mutation per site is allowed; second, the site compatibility problem (which is NP-hard in general) can be solved in polynomial time for any set of sequences that can be derived on a galled tree. Perhaps more important than the specific results about galled-trees, we introduce an approach that can be used to study recombination in general phylogenetic networks. This paper greatly extends the conference version that appears in an earlier work.(8) PowerPoint slides of the conference talk can be found at our website.(7).

摘要

系统发生网络是系统发生树的一种推广，它允许非树状的结构特性。在一篇开创性的论文中，王等人（1）研究了构建系统发生网络的问题，该网络允许序列之间发生重组，且限制所产生的环必须不相交。我们将这样的系统发生网络称为“带结树”。他们给出了一个多项式时间算法，旨在确定一组序列是否可以在带结树上生成。不幸的是，王等人（1）的算法并不完整，对于数据是否存在带结树而言，它并不构成一个必要的检验。在本文中，我们完全解决了这个问题。此外，我们证明，如果存在带结树，那么我们的算法所产生的带结树在所有针对该数据的系统发生网络中使重组次数最小，甚至允许多次交叉重组。我们还证明，当数据存在带结树时，使重组次数最小的带结树“本质上是唯一的”。我们还注意到另外两个结果：第一，任何可以在带结树上推导出来的序列集都可以在一棵真实树（无重组环）上推导出来，其中每个位点最多允许一次反向突变；第二，对于任何可以在带结树上推导出来的序列集，位点兼容性问题（一般来说是NP难问题）都可以在多项式时间内解决。也许比关于带结树的具体结果更重要的是，我们引入了一种可用于研究一般系统发生网络中重组的方法。本文极大地扩展了早期工作中出现的会议版本（8）。会议报告的PowerPoint幻灯片可在我们的网站（7）上找到。