Program in Bioinformatics and Integrative Biology, Department of Biochemistry and Molecular Pharmacology, University of Massachusetts Medical School, Worcester, Massachusetts 01605, USA.
Genome Res. 2012 Sep;22(9):1798-812. doi: 10.1101/gr.139105.112.
Chromatin immunoprecipitation coupled with high-throughput sequencing (ChIP-seq) has become the dominant technique for mapping transcription factor (TF) binding regions genome-wide. We performed an integrative analysis centered around 457 ChIP-seq data sets on 119 human TFs generated by the ENCODE Consortium. We identified highly enriched sequence motifs in most data sets, revealing new motifs and validating known ones. The motif sites (TF binding sites) are highly conserved evolutionarily and show distinct footprints upon DNase I digestion. We frequently detected secondary motifs in addition to the canonical motifs of the TFs, indicating tethered binding and cobinding between multiple TFs. We observed significant position and orientation preferences between many cobinding TFs. Genes specifically expressed in a cell line are often associated with a greater occurrence of nearby TF binding in that cell line. We observed cell-line-specific secondary motifs that mediate the binding of the histone deacetylase HDAC2 and the enhancer-binding protein EP300. TF binding sites are located in GC-rich, nucleosome-depleted, and DNase I sensitive regions, flanked by well-positioned nucleosomes, and many of these features show cell type specificity. The GC-richness may be beneficial for regulating TF binding because, when unoccupied by a TF, these regions are occupied by nucleosomes in vivo. We present the results of our analysis in a TF-centric web repository Factorbook (http://factorbook.org) and will continually update this repository as more ENCODE data are generated.
染色质免疫沉淀结合高通量测序(ChIP-seq)已成为绘制转录因子(TF)全基因组结合区域的主要技术。我们对 ENCODE 联盟生成的 119 个人类 TF 的 457 个 ChIP-seq 数据集进行了综合分析。我们在大多数数据集中鉴定了高度富集的序列基序,揭示了新的基序并验证了已知的基序。这些基序位点(TF 结合位点)在进化上高度保守,并在 DNase I 消化后显示出独特的足迹。除了 TF 的典型基序外,我们还经常检测到次要基序,这表明多个 TF 之间存在束缚结合和共结合。我们观察到许多共结合 TF 之间存在显著的位置和取向偏好。在细胞系中特异性表达的基因通常与该细胞系中附近 TF 结合的发生频率更高相关。我们观察到介导组蛋白去乙酰化酶 HDAC2 和增强子结合蛋白 EP300 结合的细胞系特异性次要基序。TF 结合位点位于富含 GC、核小体缺失和 DNase I 敏感的区域,侧翼是定位良好的核小体,其中许多特征表现出细胞类型特异性。GC 丰富度可能有利于调节 TF 结合,因为在未被 TF 占据时,这些区域在体内被核小体占据。我们在以 TF 为中心的网络资源库 Factorbook(http://factorbook.org)中呈现了我们的分析结果,并将随着更多 ENCODE 数据的生成不断更新此资源库。