Liang Shoudan
NASA Ames Research Center, NASA Advanced Supercomputing Division, Moffett Field, CA 94035, USA.
Proc IEEE Comput Soc Bioinform Conf. 2003;2:260-5.
The cWINNOWER algorithm detects fuzzy motifs in DNA sequences rich in protein-binding signals. A signal is defined as any short nucleotide pattern having up to d mutations differing from a motif of length l. The algorithm finds such motifs if multiple mutated copies of the motif (i.e., the signals) are present in the DNA sequence in sufficient abundance. The cWINNOWER algorithm substantially improves the sensitivity of the winnower method of Pevzner and Sze by imposing a consensus constraint, enabling it to detect much weaker signals. We studied the minimum number of detectable motifs qc as a function of sequence length N for random sequences. We found that q(c) increases linearly with N for a fast version of the algorithm based on counting three-member sub-cliques. Imposing consensus constraints reduces q(c) by a factor of three in this case, which makes the algorithm dramatically more sensitive. Our most sensitive algorithm, which counts four-member sub-cliques, needs a minimum of only 13 signals to detect motifs in a sequence of length N = 12,000 for (l,d) = (15,4).
cWINNOWER算法可在富含蛋白质结合信号的DNA序列中检测模糊基序。信号被定义为与长度为l的基序最多有d个突变差异的任何短核苷酸模式。如果基序的多个突变拷贝(即信号)在DNA序列中足够丰富,该算法就能找到此类基序。cWINNOWER算法通过施加一致性约束,大幅提高了Pevzner和Sze的筛选算法的灵敏度,使其能够检测到弱得多的信号。我们研究了随机序列中可检测基序的最小数量qc与序列长度N的函数关系。我们发现,对于基于计数三元子团的快速算法版本,qc随N线性增加。在这种情况下,施加一致性约束可使qc降低三分之一,这使得该算法的灵敏度大幅提高。我们最灵敏的算法,即计数四元子团的算法,对于(l,d) = (15,4),在长度N = 12,000的序列中检测基序时,最少仅需13个信号。