Benson G
Department of Biomathematical Sciences, Mount Sinai School of Medicine, New York, New York 10029-6574, USA.
J Comput Biol. 1997 Fall;4(3):351-67. doi: 10.1089/cmb.1997.4.351.
Algorithm development for comparing and aligning biological sequences has, until recently, been based on the SI model of mutational events which assumes that modification of sequences proceeds through any of the operations of substitution, insertion or deletion (the latter two collectively termed indels). While this model has worked fairly well, it has long been apparent that other mutational events occur. In this paper, we introduce a new model, the DSI model which includes another common mutational event, tandem duplication. Tandem duplication produces tandem repeats which are common in DNA, making up perhaps 10% of the human genome. They are responsible for some human diseases and may serve a multitude of functions in DNA regulation and evolution. Using the DSI model, we develop new exact and heuristic algorithms for comparing and aligning DNA sequences when they contain tandem repeats.
直到最近,用于比较和比对生物序列的算法开发一直基于突变事件的SI模型,该模型假定序列的修改通过替换、插入或删除(后两者统称为插入缺失)等任何操作进行。虽然这个模型运行得相当不错,但长期以来很明显还会发生其他突变事件。在本文中,我们引入了一种新模型——DSI模型,它包含另一种常见的突变事件——串联重复。串联重复会产生串联重复序列,这些序列在DNA中很常见,可能占人类基因组的10%左右。它们与一些人类疾病有关,并且可能在DNA调控和进化中发挥多种功能。使用DSI模型,我们开发了新的精确算法和启发式算法,用于在DNA序列包含串联重复时进行比较和比对。