Leung Henry C M, Chin Francis Y L
Department of Computer Science, The University of Hong Kong, Pokfulam, Hong Kong, China.
J Bioinform Comput Biol. 2006 Feb;4(1):43-58. doi: 10.1142/s0219720006001692.
Pevzner and Sze(19) have introduced the Planted (l,d)-Motif Problem to find similar patterns (motifs) in sequences which represent the promoter regions of co-regulated genes, where l is the length of the motif and d is the maximum Hamming distance around the similar patterns. Many algorithms have been developed to solve this motif problem. However, these algorithms either have long running times or do not guarantee the motif can be found. In this paper, we introduce new algorithms to solve this motif problem. Our algorithms can find motifs in reasonable time for not only the challenging (9, 2), (11, 3), (15, 5)-motif problems but for even longer motifs, say (20, 7), (30, 11) and (40, 15), which have never been seriously attempted by other researchers because of the large time and space required. Besides, our algorithms can be extended to find more complicated motifs structure called cis-regulatory modules (CRM).
佩夫兹纳和泽(19)提出了植入(l,d)-基序问题,以在代表共同调控基因启动子区域的序列中寻找相似模式(基序),其中l是基序的长度,d是相似模式周围的最大汉明距离。已经开发了许多算法来解决这个基序问题。然而,这些算法要么运行时间长,要么不能保证找到基序。在本文中,我们介绍了新的算法来解决这个基序问题。我们的算法不仅能在合理的时间内找到具有挑战性的(9,2)、(11,3)、(15,5)-基序问题的基序,还能找到更长的基序,比如(20,7)、(30,11)和(40,15),由于所需的大量时间和空间,其他研究人员从未认真尝试过这些基序。此外,我们的算法可以扩展以找到更复杂的基序结构,即顺式调控模块(CRM)。