Department of Stem Cell and Regenerative Biology, Harvard University, Cambridge, Massachusetts 02138, USA.
Department of Biological and Biomedical Sciences, Harvard Medical School, Boston, Massachusetts 02115, USA.
Genome Res. 2019 Mar;29(3):344-355. doi: 10.1101/gr.242222.118. Epub 2019 Jan 25.
Transcription initiates at both coding and noncoding genomic elements, including mRNA and long noncoding RNA (lncRNA) core promoters and enhancer RNAs (eRNAs). However, each class has a different expression profile with lncRNAs and eRNAs being the most tissue specific. How these complex differences in expression profiles and tissue specificities are encoded in a single DNA sequence remains unresolved. Here, we address this question using computational approaches and massively parallel reporter assays (MPRA) surveying hundreds of promoters and enhancers. We find that both divergent lncRNA and mRNA core promoters have higher capacities to drive transcription than nondivergent lncRNA and mRNA core promoters, respectively. Conversely, intergenic lncRNAs (lincRNAs) and eRNAs have lower capacities to drive transcription and are more tissue specific than divergent genes. This higher tissue specificity is strongly associated with having less complex transcription factor (TF) motif profiles at the core promoter. We experimentally validated these findings by testing both engineered single-nucleotide deletions and human single-nucleotide polymorphisms (SNPs) in MPRA. In both cases, we observe that single nucleotides associated with many motifs are important drivers of promoter activity. Thus, we suggest that high TF motif density serves as a robust mechanism to increase promoter activity at the expense of tissue specificity. Moreover, we find that 22% of common SNPs in core promoter regions have significant regulatory effects. Collectively, our findings show that high TF motif density provides redundancy and increases promoter activity at the expense of tissue specificity, suggesting that specificity of expression may be regulated by simplicity of motif usage.
转录起始于编码和非编码基因组元件,包括 mRNA 和长非编码 RNA(lncRNA)核心启动子和增强子 RNA(eRNA)。然而,每一类都有不同的表达谱,lncRNA 和 eRNA 的组织特异性最强。这些复杂的表达谱和组织特异性差异是如何在单个 DNA 序列中编码的,目前仍未解决。在这里,我们使用计算方法和大规模平行报告基因检测(MPRA)来解决这个问题,该方法调查了数百个启动子和增强子。我们发现,发散的 lncRNA 和 mRNA 核心启动子分别比非发散的 lncRNA 和 mRNA 核心启动子具有更高的转录驱动能力。相反,基因间 lncRNA(lincRNA)和 eRNA 的转录驱动能力较低,组织特异性更强。这种更高的组织特异性与核心启动子中具有较少复杂的转录因子(TF)基序谱强烈相关。我们通过在 MPRA 中测试工程化的单核苷酸缺失和人类单核苷酸多态性(SNP)来实验验证了这些发现。在这两种情况下,我们观察到与许多基序相关的单核苷酸是启动子活性的重要驱动因素。因此,我们认为高 TF 基序密度是一种增加启动子活性而牺牲组织特异性的稳健机制。此外,我们发现核心启动子区域 22%的常见 SNP 具有显著的调控作用。总的来说,我们的研究结果表明,高 TF 基序密度提供了冗余,并以牺牲组织特异性为代价增加了启动子活性,这表明表达的特异性可能受到基序使用的简单性的调节。