Liang S, Samanta M P, Biegel B A
NASA Ames Research Center, NASA Advanced Supercomputing Division, Moffett Field, CA 94035, USA.
J Bioinform Comput Biol. 2004 Mar;2(1):47-60. doi: 10.1142/s0219720004000466.
The cWINNOWER algorithm detects fuzzy motifs in DNA sequences rich in protein-binding signals. A signal is defined as any short nucleotide pattern having up to d mutations differing from a motif of length l. The algorithm finds such motifs if a clique consisting of a sufficiently large number of mutated copies of the motif (i.e., the signals) is present in the DNA sequence. The cWINNOWER algorithm substantially improves the sensitivity of the winnower method of Pevzner and Sze by imposing a consensus constraint, enabling it to detect much weaker signals. We studied the minimum detectable clique size qc as a function of sequence length N for random sequences. We found that qc increases linearly with N for a fast version of the algorithm based on counting three-member sub-cliques. Imposing consensus constraints reduces qc by a factor of three in this case, which makes the algorithm dramatically more sensitive. Our most sensitive algorithm, which counts four-member sub-cliques, needs a minimum of only 13 signals to detect motifs in a sequence of length N = 12,000 for (l, d) = (15, 4).
cWINNOWER算法可在富含蛋白质结合信号的DNA序列中检测模糊基序。信号被定义为任何与长度为l的基序最多有d个突变差异的短核苷酸模式。如果DNA序列中存在由足够数量的基序(即信号)的突变副本组成的团簇,该算法就能找到此类基序。cWINNOWER算法通过施加一致性约束,大幅提高了Pevzner和Sze的筛选算法的灵敏度,使其能够检测到弱得多的信号。我们研究了随机序列中最小可检测团簇大小qc与序列长度N的函数关系。我们发现,对于基于计数三元子团簇的快速算法版本,qc随N线性增加。在这种情况下,施加一致性约束可使qc降低三分之一,这使得该算法的灵敏度显著提高。我们最灵敏的算法,即计数四元子团簇的算法,对于(l, d) = (15, 4)的情况,在长度N = 12,000的序列中检测基序时,最少仅需13个信号。