Simonti Corinne N, Pavlicev Mihaela, Capra John A
Vanderbilt Genetics Institute, Vanderbilt University, Nashville, TN.
Center for Prevention of Preterm Birth, Perinatal Institute, Cincinnati Children's Hospital Medical Center, Cincinnati, OH.
Mol Biol Evol. 2017 Nov 1;34(11):2856-2869. doi: 10.1093/molbev/msx219.
Transposable element (TE)-derived sequences make up approximately half of most mammalian genomes, and many TEs have been co-opted into gene regulatory elements. However, we lack a comprehensive tissue- and genome-wide understanding of how and when TEs gain regulatory activity in their hosts. We evaluated the prevalence of TE-derived DNA in enhancers and promoters across hundreds of human and mouse cell lines and primary tissues. Promoters are significantly depleted of TEs in all tissues compared with their overall prevalence in the genome (P < 0.001); enhancers are also depleted of TEs, though not as strongly as promoters. The degree of enhancer depletion also varies across contexts (1.5-3×), with reproductive and immune cells showing the highest levels of TE regulatory activity in humans. Overall, in spite of the regulatory potential of many TE sequences, they are significantly less active in gene regulation than expected from their prevalence. TE age is predictive of the likelihood of enhancer activity; TEs originating before the divergence of amniotes are 9.2 times more likely to have enhancer activity than TEs that integrated in great apes. Context-specific enhancers are more likely to be TE-derived than enhancers active in multiple tissues, and young TEs are more likely to overlap context-specific enhancers than old TEs (86% vs. 47%). Once TEs obtain enhancer activity in the host, they have similar functional dynamics to one another and non-TE-derived enhancers, likely driven by pleiotropic constraints. However, a few TE families, most notably endogenous retroviruses, have greater regulatory potential. Our observations suggest a model of regulatory co-option in which TE-derived sequences are initially repressed, after which a small fraction obtains context-specific enhancer activity, with further gains subject to pleiotropic constraints.
转座元件(TE)衍生序列约占大多数哺乳动物基因组的一半,并且许多TE已被纳入基因调控元件。然而,我们缺乏对TE在其宿主中如何以及何时获得调控活性的全面的组织和全基因组层面的了解。我们评估了数百种人类和小鼠细胞系及原代组织中增强子和启动子中TE衍生DNA的普遍性。与它们在基因组中的总体普遍性相比,所有组织中的启动子中TE明显减少(P < 0.001);增强子中TE也减少,尽管不如启动子那么强烈。增强子减少的程度在不同情况下也有所不同(1.5 - 3倍),生殖细胞和免疫细胞在人类中表现出最高水平的TE调控活性。总体而言,尽管许多TE序列具有调控潜力,但它们在基因调控中的活性明显低于根据其普遍性预期的水平。TE的年龄可预测增强子活性的可能性;羊膜动物分化之前起源的TE具有增强子活性的可能性比在大猿中整合的TE高9.2倍。与在多个组织中活跃的增强子相比,特定于上下文的增强子更有可能是TE衍生的,并且年轻的TE比旧的TE更有可能与特定于上下文的增强子重叠(86%对47%)。一旦TE在宿主中获得增强子活性,它们彼此之间以及与非TE衍生的增强子具有相似的功能动态,这可能是由多效性限制驱动的。然而,少数TE家族,最显著的是内源性逆转录病毒,具有更大的调控潜力。我们的观察结果提出了一种调控征用模型,其中TE衍生序列最初受到抑制,之后一小部分获得特定于上下文的增强子活性,进一步的获得受到多效性限制。