Ng Felicia S L, Schütte Judith, Ruau David, Diamanti Evangelia, Hannah Rebecca, Kinston Sarah J, Göttgens Berthold
Department of Haematology, Wellcome Trust and MRC Cambridge Stem Cell Institute & Cambridge Institute for Medical Research, Cambridge University, Cambridge CB2 0XY, UK.
Department of Haematology, Wellcome Trust and MRC Cambridge Stem Cell Institute & Cambridge Institute for Medical Research, Cambridge University, Cambridge CB2 0XY, UK
Nucleic Acids Res. 2014 Dec 16;42(22):13513-24. doi: 10.1093/nar/gku1254. Epub 2014 Nov 26.
Combinatorial transcription factor (TF) binding is essential for cell-type-specific gene regulation. However, much remains to be learned about the mechanisms of TF interactions, including to what extent constrained spacing and orientation of interacting TFs are critical for regulatory element activity. To examine the relative prevalence of the 'enhanceosome' versus the 'TF collective' model of combinatorial TF binding, a comprehensive analysis of TF binding site sequences in large scale datasets is necessary. We developed a motif-pair discovery pipeline to identify motif co-occurrences with preferential distance(s) between motifs in TF-bound regions. Utilizing a compendium of 289 mouse haematopoietic TF ChIP-seq datasets, we demonstrate that haematopoietic-related motif-pairs commonly occur with highly conserved constrained spacing and orientation between motifs. Furthermore, motif clustering revealed specific associations for both heterotypic and homotypic motif-pairs with particular haematopoietic cell types. We also showed that disrupting the spacing between motif-pairs significantly affects transcriptional activity in a well-known motif-pair-E-box and GATA, and in two previously unknown motif-pairs with constrained spacing-Ets and Homeobox as well as Ets and E-box. In this study, we provide evidence for widespread sequence-specific TF pair interaction with DNA that conforms to the 'enhanceosome' model, and furthermore identify associations between specific haematopoietic cell-types and motif-pairs.
组合转录因子(TF)结合对于细胞类型特异性基因调控至关重要。然而,关于TF相互作用的机制仍有许多有待了解的地方,包括相互作用的TF的受限间距和方向对调控元件活性的关键程度。为了研究组合TF结合的“增强体”与“TF集合”模型的相对普遍性,有必要对大规模数据集中的TF结合位点序列进行全面分析。我们开发了一种基序对发现流程,以识别TF结合区域中具有优先基序间距离的基序共现情况。利用289个小鼠造血TF ChIP-seq数据集的汇编,我们证明造血相关基序对通常以基序间高度保守的受限间距和方向出现。此外,基序聚类揭示了异型和同型基序对与特定造血细胞类型的特定关联。我们还表明,破坏基序对之间的间距会显著影响一个著名的基序对——E盒和GATA,以及两个先前未知的具有受限间距的基序对——Ets和同源框以及Ets和E盒的转录活性。在这项研究中,我们为符合“增强体”模型的与DNA的广泛序列特异性TF对相互作用提供了证据,并且进一步确定了特定造血细胞类型与基序对之间的关联。