Chindelevitch Leonid, Li Zhentao, Blais Eric, Blanchette Mathieu
School of Computer Science, McGill University, 3480 University Street, Montreal, Quebec, H3A 2A7, Canada.
J Bioinform Comput Biol. 2006 Jun;4(3):721-44. doi: 10.1142/s0219720006002168.
Given a multiple alignment of orthologous DNA sequences and a phylogenetic tree for these sequences, we investigate the problem of reconstructing a most parsimonious scenario of insertions and deletions capable of explaining the gaps observed in the alignment. This problem, called the Indel Parsimony Problem, is a crucial component of the problem of ancestral genome reconstruction, and its solution provides valuable information to many genome functional annotation approaches. We first show that the problem is NP-complete. Second, we provide an algorithm, based on the fractional relaxation of an integer linear programming formulation. The algorithm is fast in practice, and the solutions it produces are, in most cases, provably optimal. We describe a divide-and-conquer approach that makes it possible to solve very large instances on a simple desktop machine, while retaining guaranteed optimality. Our algorithms are tested and shown efficient and accurate on a set of 1.8 Mb mammalian orthologous sequences in the CFTR region.
给定直系同源DNA序列的多重比对以及这些序列的系统发育树,我们研究了重建一个最简约的插入和缺失情况的问题,该情况能够解释比对中观察到的空位。这个问题,称为插入缺失简约问题,是祖先基因组重建问题的关键组成部分,其解决方案为许多基因组功能注释方法提供了有价值的信息。我们首先表明该问题是NP完全问题。其次,我们基于整数线性规划公式的分数松弛提供了一种算法。该算法在实际应用中速度很快,并且在大多数情况下,它产生的解决方案被证明是最优的。我们描述了一种分治方法,该方法使得在一台简单的台式机上解决非常大的实例成为可能,同时保持最优性的保证。我们的算法在CFTR区域的一组1.8 Mb哺乳动物直系同源序列上进行了测试,结果表明它们高效且准确。