Li Xiao-yong, MacArthur Stewart, Bourgon Richard, Nix David, Pollard Daniel A, Iyer Venky N, Hechmer Aaron, Simirenko Lisa, Stapleton Mark, Luengo Hendriks Cris L, Chu Hou Cheng, Ogawa Nobuo, Inwood William, Sementchenko Victor, Beaton Amy, Weiszmann Richard, Celniker Susan E, Knowles David W, Gingeras Tom, Speed Terence P, Eisen Michael B, Biggin Mark D
Genomics Division, Lawrence Berkeley National Laboratory, Berkeley, California, United States of America.
PLoS Biol. 2008 Feb;6(2):e27. doi: 10.1371/journal.pbio.0060027.
Identifying the genomic regions bound by sequence-specific regulatory factors is central both to deciphering the complex DNA cis-regulatory code that controls transcription in metazoans and to determining the range of genes that shape animal morphogenesis. We used whole-genome tiling arrays to map sequences bound in Drosophila melanogaster embryos by the six maternal and gap transcription factors that initiate anterior-posterior patterning. We find that these sequence-specific DNA binding proteins bind with quantitatively different specificities to highly overlapping sets of several thousand genomic regions in blastoderm embryos. Specific high- and moderate-affinity in vitro recognition sequences for each factor are enriched in bound regions. This enrichment, however, is not sufficient to explain the pattern of binding in vivo and varies in a context-dependent manner, demonstrating that higher-order rules must govern targeting of transcription factors. The more highly bound regions include all of the over 40 well-characterized enhancers known to respond to these factors as well as several hundred putative new cis-regulatory modules clustered near developmental regulators and other genes with patterned expression at this stage of embryogenesis. The new targets include most of the microRNAs (miRNAs) transcribed in the blastoderm, as well as all major zygotically transcribed dorsal-ventral patterning genes, whose expression we show to be quantitatively modulated by anterior-posterior factors. In addition to these highly bound regions, there are several thousand regions that are reproducibly bound at lower levels. However, these poorly bound regions are, collectively, far more distant from genes transcribed in the blastoderm than highly bound regions; are preferentially found in protein-coding sequences; and are less conserved than highly bound regions. Together these observations suggest that many of these poorly bound regions are not involved in early-embryonic transcriptional regulation, and a significant proportion may be nonfunctional. Surprisingly, for five of the six factors, their recognition sites are not unambiguously more constrained evolutionarily than the immediate flanking DNA, even in more highly bound and presumably functional regions, indicating that comparative DNA sequence analysis is limited in its ability to identify functional transcription factor targets.
识别序列特异性调控因子所结合的基因组区域,对于解读控制后生动物转录的复杂DNA顺式调控密码以及确定塑造动物形态发生的基因范围都至关重要。我们使用全基因组平铺阵列来绘制黑腹果蝇胚胎中由启动前后模式形成的六个母体和间隙转录因子所结合的序列。我们发现,这些序列特异性DNA结合蛋白以定量不同的特异性与胚盘胚胎中数千个基因组区域的高度重叠集结合。每个因子的特定高亲和力和中等亲和力体外识别序列在结合区域中富集。然而,这种富集不足以解释体内的结合模式,并且以依赖于上下文的方式变化,这表明高阶规则必须控制转录因子的靶向。结合程度更高的区域包括所有已知对这些因子有反应的40多个特征明确的增强子,以及数百个假定的新顺式调控模块,它们聚集在发育调控因子和胚胎发育此阶段有模式表达的其他基因附近。新的靶标包括胚盘中转录的大多数微小RNA(miRNA),以及所有主要的合子转录的背腹模式形成基因,我们证明其表达受到前后因子的定量调节。除了这些结合程度高的区域外,还有数千个区域以较低水平被重复结合。然而,这些结合程度低的区域总体上比结合程度高的区域距离胚盘中转录的基因远得多;优先在蛋白质编码序列中发现;并且比结合程度高的区域保守性更低。这些观察结果共同表明,许多这些结合程度低的区域不参与早期胚胎转录调控,并且很大一部分可能是无功能的。令人惊讶的是,对于六个因子中的五个,即使在结合程度更高且可能有功能的区域,它们的识别位点在进化上并不比紧邻的侧翼DNA受到更明确的限制,这表明比较DNA序列分析在识别功能性转录因子靶标的能力方面是有限的。