Computational Systems Biology Laboratory, Department of Biochemistry and Molecular Biology, Institute of Bioinformatics, University of Georgia, GA 30602, USA.
Nucleic Acids Res. 2010 Jan;38(2):e12. doi: 10.1093/nar/gkp907. Epub 2009 Nov 11.
We present a new computational method for solving a classical problem, the identification problem of cis-regulatory motifs in a given set of promoter sequences, based on one key new idea. Instead of scoring candidate motifs individually like in all the existing motif-finding programs, our method scores groups of candidate motifs with similar sequences, called motif closures, using a P-value, which has substantially improved the prediction reliability over the existing methods. Our new P-value scoring scheme is sequence length independent, hence allowing direct comparisons among predicted motifs with different lengths on the same footing. We have implemented this method as a Motif Recognition Computer (MREC) program, and have extensively tested MREC on both simulated and biological data from prokaryotic genomes. Our test results indicate that MREC can accurately pick out the actual motif with the correct length as the best scoring candidate for the vast majority of the cases in our test set. We compared our prediction results with two motif-finding programs Cosmo and MEME, and found that MREC outperforms both programs across all the test cases by a large margin. The MREC program is available at http://csbl.bmb.uga.edu/~bingqiang/MREC1/.
我们提出了一种新的计算方法,用于解决一个经典问题,即在给定的启动子序列集中识别顺式调控基序,这是基于一个关键的新思想。与所有现有的基序发现程序不同,我们的方法不是单独对候选基序进行评分,而是使用 P 值对具有相似序列的候选基序组(称为基序闭包)进行评分,这大大提高了预测的可靠性。我们的新 P 值评分方案与序列长度无关,因此可以在相同的基础上直接比较具有不同长度的预测基序。我们已经将这种方法实现为一个基序识别计算机(MREC)程序,并在原核基因组的模拟和生物数据上对 MREC 进行了广泛的测试。我们的测试结果表明,在我们的测试集中的绝大多数情况下,MREC 可以准确地挑选出实际的基序,并将正确长度的基序作为得分最高的候选基序。我们将我们的预测结果与两个基序发现程序 Cosmo 和 MEME 进行了比较,发现 MREC 在所有测试案例中都明显优于这两个程序。MREC 程序可在 http://csbl.bmb.uga.edu/~bingqiang/MREC1/ 获得。