Computational Systems Biology Laboratory, Department of Biochemistry and Molecular Biology and Institute of Bioinformatics, University of Georgia, Athens, GA 30602, USA.
Nucleic Acids Res. 2011 Apr;39(7):e42. doi: 10.1093/nar/gkq948. Epub 2010 Dec 11.
We present a new algorithm, BOBRO, for prediction of cis-regulatory motifs in a given set of promoter sequences. The algorithm substantially improves the prediction accuracy and extends the scope of applicability of the existing programs based on two key new ideas: (i) we developed a highly effective method for reliably assessing the possibility for each position in a given promoter to be the (approximate) start of a conserved sequence motif; and (ii) we developed a highly reliable way for recognition of actual motifs from the accidental ones based on the concept of 'motif closure'. These two key ideas are embedded in a classical framework for motif finding through finding cliques in a graph but have made this framework substantially more sensitive as well as more selective in motif finding in a very noisy background. A comparative analysis shows that the performance coefficient was improved from 29% to 41% by our program compared to the best among other six state-of-the-art prediction tools on a large-scale data sets of promoters from one genome, and also consistently improved by substantial margins on another kind of large-scale data sets of orthologous promoters across multiple genomes. The power of BOBRO in dealing with noisy data was further demonstrated through identification of the motifs of the global transcriptional regulators by running it over 2390 promoter sequences of Escherichia coli K12.
我们提出了一种新的算法 BOBRO,用于预测给定启动子序列集中的顺式调控基序。该算法基于两个关键的新思想,大大提高了预测准确性,并扩展了现有程序的适用范围:(i)我们开发了一种非常有效的方法,可可靠地评估给定启动子中每个位置成为保守序列基序(近似)起点的可能性;(ii)我们开发了一种基于“基序封闭”概念从偶然基序中识别实际基序的高度可靠方法。这两个关键思想被嵌入到通过在图中找到团来寻找基序的经典框架中,但使该框架在非常嘈杂的背景下寻找基序时更加敏感和有选择性。比较分析表明,与其他六个最先进的预测工具中的最佳工具相比,我们的程序在一个基因组的启动子的大规模数据集上的性能系数从 29%提高到 41%,并且在另一种跨多个基因组的同源启动子的大规模数据集上也一致地提高了相当大的幅度。通过在 2390 个大肠杆菌 K12 启动子序列上运行 BOBRO,进一步证明了它在处理嘈杂数据方面的强大功能,以识别全局转录调节剂的基序。