Department of Computer Science, University of Illinois at Urbana-Champaign, Champaign, Illinois, United States of America.
PLoS One. 2009 Dec 1;4(12):e8155. doi: 10.1371/journal.pone.0008155.
How transcription factors (TFs) interact with cis-regulatory sequences and interact with each other is a fundamental, but not well understood, aspect of gene regulation.
METHODOLOGY/PRINCIPAL FINDINGS: We present a computational method to address this question, relying on the established biophysical principles. This method, STAP (sequence to affinity prediction), takes into account all combinations and configurations of strong and weak binding sites to analyze large scale transcription factor (TF)-DNA binding data to discover cooperative interactions among TFs, infer sequence rules of interaction and predict TF target genes in new conditions with no TF-DNA binding data. The distinctions between STAP and other statistical approaches for analyzing cis-regulatory sequences include the utility of physical principles and the treatment of the DNA binding data as quantitative representation of binding strengths. Applying this method to the ChIP-seq data of 12 TFs in mouse embryonic stem (ES) cells, we found that the strength of TF-DNA binding could be significantly modulated by cooperative interactions among TFs with adjacent binding sites. However, further analysis on five putatively interacting TF pairs suggests that such interactions may be relatively insensitive to the distance and orientation of binding sites. Testing a set of putative Nanog motifs, STAP showed that a novel Nanog motif could better explain the ChIP-seq data than previously published ones. We then experimentally tested and verified the new Nanog motif. A series of comparisons showed that STAP has more predictive power than several state-of-the-art methods for cis-regulatory sequence analysis. We took advantage of this power to study the evolution of TF-target relationship in Drosophila. By learning the TF-DNA interaction models from the ChIP-chip data of D. melanogaster (Mel) and applying them to the genome of D. pseudoobscura (Pse), we found that only about half of the sequences strongly bound by TFs in Mel have high binding affinities in Pse. We show that prediction of functional TF targets from ChIP-chip data can be improved by using the conservation of STAP predicted affinities as an additional filter.
CONCLUSIONS/SIGNIFICANCE: STAP is an effective method to analyze binding site arrangements, TF cooperativity, and TF target genes from genome-wide TF-DNA binding data.
转录因子(TFs)如何与顺式调控序列相互作用以及相互作用,是基因调控中一个基本但尚未完全理解的方面。
方法/主要发现:我们提出了一种计算方法来解决这个问题,该方法依赖于已建立的生物物理原理。这种方法,即 STAP(序列到亲和力预测),考虑了强和弱结合位点的所有组合和构型,以分析大规模转录因子(TF)-DNA 结合数据,发现 TF 之间的协同相互作用,推断相互作用的序列规则,并在没有 TF-DNA 结合数据的新条件下预测 TF 靶基因。STAP 与其他用于分析顺式调控序列的统计方法的区别在于物理原理的实用性以及将 DNA 结合数据视为结合强度的定量表示。将该方法应用于 12 种 TF 在小鼠胚胎干细胞(ES)中的 ChIP-seq 数据,我们发现 TF-DNA 结合的强度可以通过相邻结合位点的 TF 之间的协同相互作用显著调节。然而,对五个假定相互作用的 TF 对的进一步分析表明,这种相互作用可能对结合位点的距离和方向不太敏感。测试一组假定的 Nanog 基序,STAP 表明,一个新的 Nanog 基序可以比以前发表的基序更好地解释 ChIP-seq 数据。然后我们通过实验测试和验证了新的 Nanog 基序。一系列比较表明,STAP 比几种用于顺式调控序列分析的最先进方法具有更高的预测能力。我们利用这种能力研究了果蝇中 TF-靶关系的进化。通过从 D. melanogaster(Mel)的 ChIP-chip 数据中学习 TF-DNA 相互作用模型,并将其应用于 D. pseudoobscura(Pse)的基因组,我们发现,在 Mel 中被 TF 强烈结合的序列中,只有大约一半在 Pse 中具有高结合亲和力。我们表明,使用 STAP 预测亲和力的保守性作为附加筛选,可以提高从 ChIP-chip 数据预测功能 TF 靶基因的能力。
结论/意义:STAP 是一种有效的方法,可以从全基因组 TF-DNA 结合数据中分析结合位点排列、TF 协同作用和 TF 靶基因。