The Francis Crick Institute, London NW1 1AT, United Kingdom.
Computational Regulatory Genomics, MRC London Institute of Medical Sciences, London W12 0NN, United Kingdom.
Genome Res. 2018 May;28(5):676-688. doi: 10.1101/gr.231449.117. Epub 2018 Apr 4.
Understanding the molecular mechanisms and evolution of the gene regulatory system remains a major challenge in biology. Transcription start sites (TSSs) are especially interesting because they are central to initiating gene expression. Previous studies revealed widespread transcription initiation and fast turnover of TSSs in mammalian genomes. Yet, how new TSSs originate and how they evolve over time remain poorly understood. To address these questions, we analyzed ∼200,000 human TSSs by integrating evolutionary (inter- and intra-species) and functional genomic data, particularly focusing on evolutionarily young TSSs that emerged in the primate lineage. TSSs were grouped according to their evolutionary age using sequence alignment information as a proxy. Comparisons of young and old TSSs revealed that (1) new TSSs emerge through a combination of intrinsic factors, like the sequence properties of transposable elements and tandem repeats, and extrinsic factors such as their proximity to existing regulatory modules; (2) new TSSs undergo rapid evolution that reduces the inherent instability of repeat sequences associated with a high propensity of TSS emergence; and (3) once established, the transcriptional competence of surviving TSSs is gradually enhanced, with evolutionary changes subject to temporal (fewer regulatory changes in younger TSSs) and spatial constraints (fewer regulatory changes in more isolated TSSs). These findings advance our understanding of how regulatory innovations arise in the genome throughout evolution and highlight the genomic robustness and evolvability in these processes.
理解基因调控系统的分子机制和进化仍然是生物学中的一个主要挑战。转录起始位点(TSS)特别有趣,因为它们是启动基因表达的核心。以前的研究揭示了哺乳动物基因组中广泛的转录起始和 TSS 的快速周转。然而,新的 TSS 是如何产生的,以及它们随着时间的推移如何进化,仍然知之甚少。为了解决这些问题,我们通过整合进化(种间和种内)和功能基因组数据,特别是关注在灵长类动物谱系中出现的进化年轻的 TSS,分析了约 20 万个人类 TSS。TSS 根据其进化年龄使用序列比对信息作为代理进行分组。对年轻和年老 TSS 的比较表明:(1)新的 TSS 通过内在因素(如转座元件和串联重复序列的序列特性)和外在因素(如它们与现有调控模块的接近程度)的组合而出现;(2)新的 TSS 经历快速进化,降低了与 TSS 出现高倾向相关的重复序列的固有不稳定性;(3)一旦建立,幸存的 TSS 的转录能力逐渐增强,进化变化受时间(年轻 TSS 的调控变化较少)和空间限制(更孤立的 TSS 的调控变化较少)的影响。这些发现增进了我们对整个进化过程中基因组中调控创新是如何产生的理解,并强调了这些过程中基因组的稳健性和可进化性。