Department of Cancer Research and Molecular Medicine, Norwegian University of Science and Technology, NO-7491, Trondheim, Norway.
BMC Genomics. 2012 Aug 3;13:372. doi: 10.1186/1471-2164-13-372.
Context-dependent transcription factor (TF) binding is one reason for differences in gene expression patterns between different cellular states. Chromatin immunoprecipitation followed by high-throughput sequencing (ChIP-seq) identifies genome-wide TF binding sites for one particular context-the cells used in the experiment. But can such ChIP-seq data predict TF binding in other cellular contexts and is it possible to distinguish context-dependent from ubiquitous TF binding?
We compared ChIP-seq data on TF binding for multiple TFs in two different cell types and found that on average only a third of ChIP-seq peak regions are common to both cell types. Expectedly, common peaks occur more frequently in certain genomic contexts, such as CpG-rich promoters, whereas chromatin differences characterize cell-type specific TF binding. We also find, however, that genotype differences between the cell types can explain differences in binding. Moreover, ChIP-seq signal intensity and peak clustering are the strongest predictors of common peaks. Compared with strong peaks located in regions containing peaks for multiple transcription factors, weak and isolated peaks are less common between the cell types and are less associated with data that indicate regulatory activity.
Together, the results suggest that experimental noise is prevalent among weak peaks, whereas strong and clustered peaks represent high-confidence binding events that often occur in other cellular contexts. Nevertheless, 30-40% of the strongest and most clustered peaks show context-dependent regulation. We show that by combining signal intensity with additional data-ranging from context independent information such as binding site conservation and position weight matrix scores to context dependent chromatin structure-we can predict whether a ChIP-seq peak is likely to be present in other cellular contexts.
依赖于上下文的转录因子(TF)结合是不同细胞状态下基因表达模式差异的原因之一。染色质免疫沉淀结合高通量测序(ChIP-seq)可识别特定实验细胞环境下的全基因组 TF 结合位点。但是,这种 ChIP-seq 数据是否可以预测其他细胞环境中的 TF 结合,以及是否可以区分依赖于上下文的 TF 结合和普遍存在的 TF 结合?
我们比较了两种不同细胞类型中多个 TF 的 ChIP-seq 数据,发现平均只有三分之一的 ChIP-seq 峰区域在两种细胞类型中都存在。预期的是,在某些基因组环境中,如富含 CpG 的启动子,常见的峰出现频率更高,而染色质差异则表征细胞类型特异性 TF 结合。然而,我们还发现细胞类型之间的基因型差异可以解释结合的差异。此外,ChIP-seq 信号强度和峰聚类是预测常见峰的最强指标。与位于包含多个转录因子峰的区域中的强峰相比,弱峰和孤立峰在细胞类型之间不太常见,与表明调节活性的数据的关联也较少。
总的来说,结果表明,弱峰中普遍存在实验噪声,而强峰和聚类峰则代表经常出现在其他细胞环境中的高可信度结合事件。尽管如此,30-40%的最强和最聚类峰显示出依赖于上下文的调节。我们表明,通过将信号强度与其他数据相结合——从独立于上下文的信息(如结合位点保守性和位置权重矩阵得分)到依赖于上下文的染色质结构,我们可以预测 ChIP-seq 峰是否可能存在于其他细胞环境中。