Jensen Shane T, Liu Jun S
Department of Statistics, Harvard University, Cambridge, MA 02138-2901, USA.
Bioinformatics. 2004 Jul 10;20(10):1557-64. doi: 10.1093/bioinformatics/bth127. Epub 2004 Feb 12.
Transcription factors (TFs) bind directly to short segments on the genome, often within hundreds to thousands of base pairs upstream of gene transcription start sites, to regulate gene expression. The experimental determination of TFs binding sites is expensive and time-consuming. Many motif-finding programs have been developed, but no program is clearly superior in all situations. Practitioners often find it difficult to judge which of the motifs predicted by these algorithms are more likely to be biologically relevant.
We derive a comprehensive scoring function based on a full Bayesian model that can handle unknown site abundance, unknown motif width and two-block motifs with variable-length gaps. An algorithm called BioOptimizer is proposed to optimize this scoring function so as to reduce noise in the motif signal found by any motif-finding program. The accuracy of BioOptimizer, which can be used in conjunction with several existing programs, is shown to be superior to using any of these motif-finding programs alone when evaluated by both simulation studies and application to sets of co-regulated genes in bacteria. In addition, this scoring function formulation enables us to compare objectively different predicted motifs and select the optimal ones, effectively combining the strengths of existing programs.
BioOptimizer is available for download at www.fas.harvard.edu/~junliu/BioOptimizer/
转录因子(TFs)直接与基因组上的短片段结合,这些片段通常位于基因转录起始位点上游数百至数千个碱基对范围内,以调控基因表达。通过实验确定转录因子结合位点既昂贵又耗时。已经开发了许多基序查找程序,但没有一个程序在所有情况下都明显更优。从业者常常难以判断这些算法预测的哪些基序更有可能具有生物学相关性。
我们基于全贝叶斯模型推导了一个综合评分函数,该模型能够处理未知的位点丰度、未知的基序宽度以及具有可变长度间隔的双块基序。我们提出了一种名为BioOptimizer的算法来优化此评分函数,以减少任何基序查找程序所发现的基序信号中的噪声。通过模拟研究以及应用于细菌中共同调控基因集的评估表明,可与几个现有程序结合使用的BioOptimizer的准确性优于单独使用这些基序查找程序中的任何一个。此外,这种评分函数公式使我们能够客观地比较不同的预测基序并选择最优基序,有效地结合了现有程序的优势。
可在www.fas.harvard.edu/~junliu/BioOptimizer/ 下载BioOptimizer。