Davila Jaime, Balla Sudha, Rajasekaran Sanguthevar
Department of Computer Science and Engineering, University of Connecticut, Storrs, CT 06269-3155, USA.
IEEE/ACM Trans Comput Biol Bioinform. 2007 Oct-Dec;4(4):544-52. doi: 10.1109/TCBB.2007.70241.
We consider the planted (l, d) motif search problem, which consists of finding a substring of length l that occurs in a set of input sequences {s1, . . . , sn} with up to d errors, a problem that arises from the need to find transcription factor-binding sites in genomic information. We propose a sequence of practical algorithms, which start based on the ideas considered in PMS1. These algorithms are exact, have little space requirements, and are able to tackle challenging instances with bigger d, taking less time in the instances reported solved by exact algorithms. In particular, one of the proposed algorithms, PMSprune, is able to solve the challenging instances, such as (17, 6) and (19, 7), which were not previously reported as solved in the literature.
我们考虑植入式(l, d)基序搜索问题,该问题包括在一组输入序列{s1, …, sn}中找到一个长度为l的子串,该子串出现时最多有d个错误,这是一个因需要在基因组信息中找到转录因子结合位点而产生的问题。我们提出了一系列实用算法,这些算法基于PMS1中所考虑的思想开始构建。这些算法是精确的,空间需求小,并且能够处理d值更大的具有挑战性的实例,在精确算法报告已解决的实例中花费的时间更少。特别是,所提出的算法之一PMSprune能够解决具有挑战性的实例,如(17, 6)和(19, 7),这些实例在文献中之前未被报告为已解决。