Department of Genome Dynamics, Life Sciences Division, Lawrence Berkeley National Laboratory, Berkeley, California 97420, USA.
Genome Res. 2011 Feb;21(2):182-92. doi: 10.1101/gr.112466.110. Epub 2010 Dec 22.
Core promoters are critical regions for gene regulation in higher eukaryotes. However, the boundaries of promoter regions, the relative rates of initiation at the transcription start sites (TSSs) distributed within them, and the functional significance of promoter architecture remain poorly understood. We produced a high-resolution map of promoters active in the Drosophila melanogaster embryo by integrating data from three independent and complementary methods: 21 million cap analysis of gene expression (CAGE) tags, 1.2 million RNA ligase mediated rapid amplification of cDNA ends (RLM-RACE) reads, and 50,000 cap-trapped expressed sequence tags (ESTs). We defined 12,454 promoters of 8037 genes. Our analysis indicates that, due to non-promoter-associated RNA background signal, previous studies have likely overestimated the number of promoter-associated CAGE clusters by fivefold. We show that TSS distributions form a complex continuum of shapes, and that promoters active in the embryo and adult have highly similar shapes in 95% of cases. This suggests that these distributions are generally determined by static elements such as local DNA sequence and are not modulated by dynamic signals such as histone modifications. Transcription factor binding motifs are differentially enriched as a function of promoter shape, and peaked promoter shape is correlated with both temporal and spatial regulation of gene expression. Our results contribute to the emerging view that core promoters are functionally diverse and control patterning of gene expression in Drosophila and mammals.
核心启动子是真核生物基因调控的关键区域。然而,启动子区域的边界、分布在其中的转录起始位点(TSS)的起始相对速率,以及启动子结构的功能意义仍知之甚少。我们通过整合三种独立且互补的方法的数据,生成了果蝇胚胎中活性启动子的高分辨率图谱:2100 万个帽分析基因表达(CAGE)标签、120 万个 RNA 连接酶介导的快速扩增 cDNA 末端(RLM-RACE)读数,以及 50000 个帽捕获表达序列标签(EST)。我们定义了 8037 个基因的 12454 个启动子。我们的分析表明,由于非启动子相关 RNA 背景信号的存在,以前的研究可能将与启动子相关的 CAGE 簇数量高估了五倍。我们表明,TSS 分布形成了一种复杂的连续形状,胚胎和成年期活跃的启动子在 95%的情况下具有高度相似的形状。这表明这些分布通常是由局部 DNA 序列等静态元素决定的,而不受组蛋白修饰等动态信号的调节。转录因子结合基序随启动子形状的不同而差异富集,并且峰形启动子形状与基因表达的时空调节密切相关。我们的研究结果有助于形成一种新的观点,即核心启动子在功能上是多样化的,并控制果蝇和哺乳动物基因表达的模式。