Friedrich Miescher Institute for Biomedical Research, Novartis Research Foundation, Basel, Switzerland.
BMC Bioinformatics. 2009 Oct 16;10:339. doi: 10.1186/1471-2105-10-339.
It is known that transcription factors frequently act together to regulate gene expression in eukaryotes. In this paper we describe a computational analysis of transcription factor site dependencies in human, mouse and rat genomes.
Our approach for quantifying tendencies of transcription factor binding sites to co-occur is based on a binding site scoring function which incorporates dependencies between positions, the use of information about the structural class of each transcription factor (major/minor groove binder), and also considered the possible implications of varying GC content of the sequences. Significant tendencies (dependencies) have been detected by non-parametric statistical methodology (permutation tests). Evaluation of obtained results has been performed in several ways: reports from literature (many of the significant dependencies between transcription factors have previously been confirmed experimentally); dependencies between transcription factors are not biased due to similarities in their DNA-binding sites; the number of dependent transcription factors that belong to the same functional and structural class is significantly higher than would be expected by chance; supporting evidence from GO clustering of targeting genes. Based on dependencies between two transcription factor binding sites (second-order dependencies), it is possible to construct higher-order dependencies (networks). Moreover results about transcription factor binding sites dependencies can be used for prediction of groups of dependent transcription factors on a given promoter sequence. Our results, as well as a scanning tool for predicting groups of dependent transcription factors binding sites are available on the Internet.
We show that the computational analysis of transcription factor site dependencies is a valuable complement to experimental approaches for discovering transcription regulatory interactions and networks. Scanning promoter sequences with dependent groups of transcription factor binding sites improve the quality of transcription factor predictions.
众所周知,转录因子经常协同作用以调节真核生物中的基因表达。在本文中,我们描述了一种用于分析人类、小鼠和大鼠基因组中转录因子结合位点依赖性的计算方法。
我们用于量化转录因子结合位点共同出现趋势的方法基于一种结合位点评分函数,该函数结合了位置之间的依赖性、每个转录因子的结构类别的信息(大/小沟结合),并考虑了序列中 GC 含量变化的可能影响。通过非参数统计方法(置换检验)检测到显著的趋势(依赖性)。通过多种方式评估获得的结果:文献报告(许多转录因子之间的显著依赖性以前已经通过实验证实);转录因子之间的依赖性不受其 DNA 结合位点相似性的影响;属于相同功能和结构类别的依赖转录因子的数量明显高于随机预期;GO 聚类靶向基因的支持证据。基于两个转录因子结合位点之间的依赖性(二阶依赖性),可以构建更高阶的依赖性(网络)。此外,转录因子结合位点依赖性的结果可用于预测给定启动子序列上依赖的转录因子组。我们的结果以及用于预测依赖转录因子结合位点组的扫描工具可在互联网上获得。
我们表明,转录因子结合位点依赖性的计算分析是发现转录调控相互作用和网络的实验方法的有价值的补充。使用依赖的转录因子结合位点组扫描启动子序列可以提高转录因子预测的质量。