Kahn Crystal L, Raphael Benjamin J
Box 1910, Brown University, Department of Computer Science & Center for Computational Molecular Biology, Providence, RI 02912, USA.
Pac Symp Biocomput. 2009:126-37.
Segmental duplications are abundant in the human genome, but their evolutionary history is not well-understood. The mystery surrounding them is due in part to their complex organization; many segmental duplications are mosaic patterns of smaller repeated segments, or duplicons. A two-step model of duplication has been proposed to explain these mosaic patterns. In this model, duplicons are copied and aggregated into primary duplication blocks that subsequently seed secondary duplications. Here, we formalize the problem of computing a duplication scenario that is consistent with the two-step model. We first describe a dynamic programming algorithm to compute the duplication distance between two strings. We then use this distance as the cost function in an integer linear program to obtain the most parsimonious duplication scenario. We apply our method to derive putative ancestral relationships between segmental duplications in the human genome.
节段性重复在人类基因组中大量存在,但其进化历史尚未得到充分理解。围绕它们的谜团部分源于其复杂的组织形式;许多节段性重复是较小重复片段(即重复子)的镶嵌模式。有人提出了一种两步复制模型来解释这些镶嵌模式。在这个模型中,重复子被复制并聚集到初级复制块中,这些初级复制块随后引发次级复制。在这里,我们将与两步模型一致的复制情况计算问题形式化。我们首先描述一种动态规划算法来计算两个字符串之间的复制距离。然后,我们将这个距离用作整数线性规划中的成本函数,以获得最简约的复制情况。我们应用我们的方法来推导人类基因组中节段性重复之间的推定祖先关系。