Yang Eric, Simcha David, Almon Richard R, Dubois Debra C, Jusko William J, Androulakis Ioannis P
Biomedical Engineering Department, Rutgers University, Piscataway, NJ 08854, USA.
Ann Biomed Eng. 2007 Jun;35(6):1053-67. doi: 10.1007/s10439-007-9268-z. Epub 2007 Mar 22.
One of the goals of systems biology is the identification of regulatory mechanisms that govern an organism's response to external stimuli. Transcription factors have been hypothesized as a major contributor to an organism's response to various outside stimuli, and a great deal of work has been done to predict the set of transcription factors which regulate a given gene. Most of the current methods seek to identify possible binding sites from genomic sequence. Initial attempts at predicting transcription factors from genomic sequences suffered from the problem of false positives. Making the problem more difficult, it has also been shown that while predicted binding sites might be false positives, they can be shown to bind to their corresponding sequences in vitro. One method for rectifying this is through the use of phylogenetic analysis in which only regions which show high evolutionary conservation are analyzed. However such an approach may be too stringent because of the level of degeneracy shown in transcription factor binding site position weight matrices. Due to the degeneracy, there may be only a few bases that need to be conserved across species. Therefore, while a sequence may not show a high level of evolutionary conservation, these sequences may still show high affinity for the same transcription factor. In predicting transcription factor binding we explore the notion that "Co-expression implies co-regulation" [Allocco et al. BMC Bioinformatics 5:18, 2004]. With multiple genes requiring similar transcription factors binding sites, there exists a basis for eliminating false positives. This method allows for the selection of transcription factors binding sites that are active under a given experimental paradigm, thereby allowing us to indirectly incorporate the effects of chromosome and recognition site presentation upon transcription factor binding prediction. Rather than having to rationalize that a few transcription factors binding sites are over-represented in a cluster of genes, one can show that a few transcription factors are active in the cluster of genes that have been grouped together. Although the method focuses on predicting experiment-specific transcription factor binding sites, it is possible that if such a methodology were used in an iterative process where different experiments were analyzed, one could obtain a comprehensive set of transcription factors binding sites which regulate the various dynamic responses shown by biological systems under a variety of conditions hence building a more comprehensive model of transcriptional regulation.
系统生物学的目标之一是识别调控生物体对外部刺激反应的调节机制。转录因子被认为是生物体对各种外部刺激反应的主要贡献者,并且已经开展了大量工作来预测调控给定基因的转录因子集合。当前大多数方法试图从基因组序列中识别可能的结合位点。最初从基因组序列预测转录因子的尝试存在假阳性问题。使问题更加困难的是,研究还表明,虽然预测的结合位点可能是假阳性,但它们在体外可以显示与相应序列结合。纠正这一问题的一种方法是通过系统发育分析,其中仅分析显示高度进化保守性的区域。然而,由于转录因子结合位点位置权重矩阵中显示的简并程度,这种方法可能过于严格。由于简并性,跨物种可能仅需要几个碱基保守。因此,虽然一个序列可能没有显示出高度的进化保守性,但这些序列可能仍然对相同的转录因子显示出高亲和力。在预测转录因子结合时,我们探讨了“共表达意味着共调控”的概念[Allocco等人,《BMC生物信息学》5:18,2004年]。对于多个需要相似转录因子结合位点的基因,存在消除假阳性的基础。这种方法允许选择在给定实验范式下活跃的转录因子结合位点,从而使我们能够间接纳入染色体和识别位点呈现对转录因子结合预测的影响。不必去解释为什么几个转录因子结合位点在一组基因中过度代表,而是可以表明几个转录因子在聚集在一起的基因簇中是活跃的。虽然该方法侧重于预测实验特异性转录因子结合位点,但如果在分析不同实验的迭代过程中使用这种方法,有可能获得一组全面的转录因子结合位点,这些位点调控生物系统在各种条件下显示的各种动态反应,从而建立一个更全面的转录调控模型。