Bafna Vineet, Bansal Vikas
Department of Computer Science and Engineering, University of California at San Diego, La Jolla, CA 92093-0114, USA.
IEEE/ACM Trans Comput Biol Bioinform. 2004 Apr-Jun;1(2):78-90. doi: 10.1109/TCBB.2004.23.
We consider the following problem: Given a set of binary sequences, determine lower bounds on the minimum number of recombinations required to explain the history of the sample, under the infinite-sites model of mutation. The problem has implications for finding recombination hotspots and for the Ancestral Recombination Graph reconstruction problem. Hudson and Kaplan gave a lower bound based on the four-gamete test. In practice, their bound Rm often greatly underestimates the minimum number of recombinations. The problem was recently revisited by Myers and Griffiths, who introduced two new lower bounds Rh and Rs which are provably better, and also yield good bounds in practice. However, the worst-case complexities of their procedures for computing Rh and Rs are exponential and super-exponential, respectively. In this paper, we show that the number of nontrivial connected components, Rc, in the conflict graph for a given set of sequences, computable in time O(nm2), is also a lower bound on the minimum number of recombination events. We show that in many cases, Rc is a better bound than Rh. The conflict graph was used by Gusfield et al. to obtain a polynomial time algorithm for the galled tree problem, which is a special case of the Ancestral Recombination Graph (ARG) reconstruction problem. Our results also offer some insight into the structural properties of this graph and are of interest for the general Ancestral Recombination Graph reconstruction problem.
给定一组二进制序列,在无限位点突变模型下,确定解释样本历史所需的最小重组数的下界。该问题对于寻找重组热点以及祖先重组图重建问题具有重要意义。哈德森和卡普兰基于四配子检验给出了一个下界。在实际应用中,他们的界值(Rm)常常大大低估了最小重组数。迈尔斯和格里菲思最近重新审视了这个问题,他们引入了两个新的下界(Rh)和(Rs),这两个下界在理论上被证明更好,并且在实际应用中也能给出较好的界值。然而,他们计算(Rh)和(Rs)的过程的最坏情况复杂度分别是指数级和超指数级的。在本文中,我们表明对于给定的一组序列,在冲突图中可在(O(nm^2))时间内计算出的非平凡连通分量的数量(Rc),也是重组事件最小数量的一个下界。我们表明在许多情况下,(Rc)是比(Rh)更好的界值。古塞菲尔德等人利用冲突图获得了一个用于有结树问题的多项式时间算法,有结树问题是祖先重组图(ARG)重建问题的一个特殊情况。我们的结果也为该图的结构性质提供了一些见解,并且对于一般的祖先重组图重建问题具有重要意义。