用于植入式（l, d）基序搜索的快速实用算法。

Fast and practical algorithms for planted (l, d) motif search.

作者信息

Davila Jaime, Balla Sudha, Rajasekaran Sanguthevar

机构信息

Department of Computer Science and Engineering, University of Connecticut, Storrs, CT 06269-3155, USA.

出版信息

IEEE/ACM Trans Comput Biol Bioinform. 2007 Oct-Dec;4(4):544-52. doi: 10.1109/TCBB.2007.70241.

DOI:10.1109/TCBB.2007.70241

PMID:17975266

Abstract

We consider the planted (l, d) motif search problem, which consists of finding a substring of length l that occurs in a set of input sequences {s1, . . . , sn} with up to d errors, a problem that arises from the need to find transcription factor-binding sites in genomic information. We propose a sequence of practical algorithms, which start based on the ideas considered in PMS1. These algorithms are exact, have little space requirements, and are able to tackle challenging instances with bigger d, taking less time in the instances reported solved by exact algorithms. In particular, one of the proposed algorithms, PMSprune, is able to solve the challenging instances, such as (17, 6) and (19, 7), which were not previously reported as solved in the literature.

摘要

我们考虑植入式（l, d）基序搜索问题，该问题包括在一组输入序列{s1, …, sn}中找到一个长度为l的子串，该子串出现时最多有d个错误，这是一个因需要在基因组信息中找到转录因子结合位点而产生的问题。我们提出了一系列实用算法，这些算法基于PMS1中所考虑的思想开始构建。这些算法是精确的，空间需求小，并且能够处理d值更大的具有挑战性的实例，在精确算法报告已解决的实例中花费的时间更少。特别是，所提出的算法之一PMSprune能够解决具有挑战性的实例，如（17, 6）和（19, 7），这些实例在文献中之前未被报告为已解决。