Abouelhoda Mohamed I, Giegerich Robert, Behzadi Behshad, Steyaert Jean-Marc
Cairo University, Giza, Egypt.
J Bioinform Comput Biol. 2009 Apr;7(2):287-308. doi: 10.1142/s0219720009004060.
Subsequent duplication events are responsible for the evolution of the minisatellite maps. Alignment of two minisatellite maps should therefore take these duplication events into account, in addition to the well-known edit operations. All algorithms for computing an optimal alignment of two maps, including the one presented here, first deduce the costs of optimal duplication scenarios for all substrings of the given maps. Then, they incorporate the pre-computed costs in the alignment recurrence. However, all previous algorithms addressing this problem are dependent on the number of distinct map units (map alphabet) and do not fully make use of the repetitiveness of the map units. In this paper, we present an algorithm that remedies these shortcomings: our algorithm is alphabet-independent and is based on the run-length encoding scheme. It is the fastest in theory, and in practice as well, as shown by experimental results. Furthermore, our alignment model is more general than that of the previous algorithms, and captures better the duplication mechanism. Using our algorithm, we derive a quantitative evidence that there is a directional bias in the growth of minisatellites of the MSY1 dataset.
随后的复制事件导致了微卫星图谱的进化。因此,除了众所周知的编辑操作外,两个微卫星图谱的比对还应考虑这些复制事件。所有用于计算两个图谱最优比对的算法,包括本文提出的算法,首先要推导出给定图谱所有子串的最优复制情况的成本。然后,它们将预先计算的成本纳入比对递推中。然而,之前所有解决这个问题的算法都依赖于不同图谱单元(图谱字母表)的数量,并且没有充分利用图谱单元的重复性。在本文中,我们提出了一种算法来弥补这些不足:我们的算法与字母表无关,并且基于游程编码方案。理论上它也是最快的,实验结果也表明在实践中同样如此。此外,我们的比对模型比之前的算法更通用,并且能更好地捕捉复制机制。使用我们的算法,我们得出了定量证据,表明MSY1数据集的微卫星在增长过程中存在方向偏差。