Ng Patrick, Nagarajan Niranjan, Jones Neil, Keich Uri
Department of Computer Science, Cornell University, Ithaca, NY, USA.
Bioinformatics. 2006 Jul 15;22(14):e393-401. doi: 10.1093/bioinformatics/btl245.
Effective algorithms for finding relatively weak motifs are an important practical necessity while scanning long DNA sequences for regulatory elements. The success of such an algorithm hinges on the ability of its scoring function combined with a significance analysis test to discern real motifs from random noise.
In the first half of the paper we show that the paradigm of relying on entropy scores and their E-values can lead to undesirable results when searching for weak motifs and we offer alternate approaches to analyzing the significance of motifs. In the second half of the paper we reintroduce a scoring function and present a motif-finder that optimizes it that are more effective in finding relatively weak motifs than other tools.
The GibbsILR motif finder is available at http://www.cs.cornell.edu/~keich.
在扫描长DNA序列以寻找调控元件时,用于寻找相对较弱基序的有效算法是一项重要的实际需求。此类算法的成功取决于其评分函数与显著性分析测试相结合,以从随机噪声中辨别真实基序的能力。
在本文的前半部分,我们表明,在寻找弱基序时,依赖熵得分及其E值的范式可能会导致不理想的结果,并且我们提供了分析基序显著性的替代方法。在本文的后半部分,我们重新引入了一种评分函数,并提出了一种对其进行优化的基序查找器,该查找器在寻找相对较弱基序方面比其他工具更有效。
GibbsILR基序查找器可在http://www.cs.cornell.edu/~keich获得。