Division of Biology, California Institute of Technology, Pasadena, CA 91125, USA.
Genome Res. 2011 Apr;21(4):566-77. doi: 10.1101/gr.104018.109. Epub 2011 Mar 7.
Cis-regulatory modules (CRMs) function by binding sequence specific transcription factors, but the relationship between in vivo physical binding and the regulatory capacity of factor-bound DNA elements remains uncertain. We investigate this relationship for the well-studied Twist factor in Drosophila melanogaster embryos by analyzing genome-wide factor occupancy and testing the functional significance of Twist occupied regions and motifs within regions. Twist ChIP-seq data efficiently identified previously studied Twist-dependent CRMs and robustly predicted new CRM activity in transgenesis, with newly identified Twist-occupied regions supporting diverse spatiotemporal patterns (>74% positive, n = 31). Some, but not all, candidate CRMs require Twist for proper expression in the embryo. The Twist motifs most favored in genome ChIP data (in vivo) differed from those most favored by Systematic Evolution of Ligands by EXponential enrichment (SELEX) (in vitro). Furthermore, the majority of ChIP-seq signals could be parsimoniously explained by a CABVTG motif located within 50 bp of the ChIP summit and, of these, CACATG was most prevalent. Mutagenesis experiments demonstrated that different Twist E-box motif types are not fully interchangeable, suggesting that the ChIP-derived consensus (CABVTG) includes sites having distinct regulatory outputs. Further analysis of position, frequency of occurrence, and sequence conservation revealed significant enrichment and conservation of CABVTG E-box motifs near Twist ChIP-seq signal summits, preferential conservation of ±150 bp surrounding Twist occupied summits, and enrichment of GA- and CA-repeat sequences near Twist occupied summits. Our results show that high resolution in vivo occupancy data can be used to drive efficient discovery and dissection of global and local cis-regulatory logic.
顺式调控模块(CRMs)通过结合序列特异性转录因子发挥作用,但体内物理结合与因子结合 DNA 元件的调控能力之间的关系尚不确定。我们通过分析全基因组因子占据情况并测试 Twist 占据区域和区域内基序的功能意义,研究了果蝇胚胎中研究充分的 Twist 因子的这种关系。Twist ChIP-seq 数据有效地识别了先前研究的依赖 Twist 的 CRM,并在转基因中稳健地预测了新的 CRM 活性,新鉴定的 Twist 占据区域支持多种时空模式(>74%阳性,n=31)。一些,但不是全部,候选 CRM 需要 Twist 才能在胚胎中正确表达。基因组 ChIP 数据(体内)中最受青睐的 Twist 基序与系统进化的配体通过指数富集(SELEX)(体外)中最受青睐的基序不同。此外,大多数 ChIP-seq 信号可以通过位于 ChIP 峰 50bp 内的 CABVTG 基序简洁地解释,其中 CACATG 最为常见。突变实验表明,不同的 Twist E 盒基序类型不完全可互换,这表明 ChIP 衍生的共识(CABVTG)包括具有不同调节输出的位点。对位置、出现频率和序列保守性的进一步分析显示,CABVTG E 盒基序在 Twist ChIP-seq 信号峰附近显著富集和保守,在 Twist 占据峰周围的±150bp 处优先保守,并且在 Twist 占据峰附近富集 GA 和 CA 重复序列。我们的结果表明,高分辨率的体内占据数据可用于有效地发现和剖析全局和局部顺式调控逻辑。