Styczynski Mark P, Jensen Kyle L, Rigoutsos Isidore, Stephanopoulos Gregory N
Department of Chemical Engineering, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, MA 02139, USA.
Genome Inform. 2004;15(2):63-71.
The (l,d)-motif challenge problem, as introduced by Pevzner and Sze, is a mathematical abstraction of the DNA functional site discovery task. Here we expand the (l,d)-motif problem to more accurately model this task and present a novel algorithm to solve this extended problem. This algorithm is guaranteed to find all (l,d)-motifs in a set of input sequences with unbounded support and length. We demonstrate the performance of the algorithm on publicly available datasets and show that the algorithm deterministically enumerates the optimal motifs.
佩夫兹纳(Pevzner)和斯泽(Sze)提出的(l,d)基序挑战问题,是DNA功能位点发现任务的一种数学抽象。在此,我们扩展了(l,d)基序问题,以便更准确地对该任务进行建模,并提出一种新颖的算法来解决这个扩展后的问题。该算法保证能在一组具有无界支持度和长度的输入序列中找到所有(l,d)基序。我们在公开可用的数据集上展示了该算法的性能,并表明该算法能确定性地枚举最优基序。