Rao J Sunil, Karanam Suresh, McCabe Colleen D, Moreno Carlos S
Department of Epidemiology and Biostatistics, Case Western Reserve University, Cleveland, OH 44106, USA.
Adv Bioinformatics. 2008;2008:369830. doi: 10.1155/2008/369830. Epub 2008 Oct 30.
Background. The computational identification of functional transcription factor binding sites (TFBSs) remains a major challenge of computational biology. Results. We have analyzed the conserved promoter sequences for the complete set of human RefSeq genes using our conserved transcription factor binding site (CONFAC) software. CONFAC identified 16296 human-mouse ortholog gene pairs, and of those pairs, 9107 genes contained conserved TFBS in the 3 kb proximal promoter and first intron. To attempt to predict in vivo occupancy of transcription factor binding sites, we developed a novel marginal effect isolator algorithm that builds upon Bayesian methods for multigroup TFBS filtering and predicted the in vivo occupancy of two transcription factors with an overall accuracy of 84%. Conclusion. Our analyses show that integration of chromatin immunoprecipitation data with conserved TFBS analysis can be used to generate accurate predictions of functional TFBS. They also show that TFBS cooccurrence can be used to predict transcription factor binding to promoters in vivo.
背景。功能转录因子结合位点(TFBSs)的计算识别仍然是计算生物学的一项重大挑战。结果。我们使用我们的保守转录因子结合位点(CONFAC)软件分析了人类RefSeq基因全集的保守启动子序列。CONFAC识别出16296个人类-小鼠直系同源基因对,在这些基因对中,9107个基因在3 kb近端启动子和第一个内含子中含有保守的TFBS。为了尝试预测转录因子结合位点的体内占有率,我们开发了一种新颖的边际效应隔离算法,该算法基于用于多组TFBS过滤的贝叶斯方法,并预测了两种转录因子的体内占有率,总体准确率为84%。结论。我们的分析表明,将染色质免疫沉淀数据与保守TFBS分析相结合可用于生成功能TFBS的准确预测。它们还表明,TFBS共现可用于预测转录因子在体内与启动子的结合。