BMC Genomics. 2013 Mar 4;14:142. doi: 10.1186/1471-2164-14-142.
Tandem repeats are ubiquitous and abundant in higher eukaryotic genomes and constitute, along with transposable elements, much of DNA underlying centromeres and other heterochromatic domains. In maize, centromeric satellite repeat (CentC) and centromeric retrotransposons (CR), a class of Ty3/gypsy retrotransposons, are enriched at centromeres. Some satellite repeats have homology to retrotransposons and several mechanisms have been proposed to explain the expansion, contraction as well as homogenization of tandem repeats. However, the origin and evolution of tandem repeat loci remain largely unknown.
CRM1TR and CRM4TR are novel tandem repeats that we show to be entirely derived from CR elements belonging to two different subfamilies, CRM1 and CRM4. Although these tandem repeats clearly originated in at least two separate events, they are derived from similar regions of their respective parent element, namely the long terminal repeat (LTR) and untranslated region (UTR). The 5' ends of the monomer repeat units of CRM1TR and CRM4TR map to different locations within their respective LTRs, while their 3' ends map to the same relative position within a conserved region of their UTRs. Based on the insertion times of heterologous retrotransposons that have inserted into these tandem repeats, amplification of the repeats is estimated to have begun at least ~4 (CRM1TR) and ~1 (CRM4TR) million years ago. Distinct CRM1TR sequence variants occupy the two CRM1TR loci, indicating that there is little or no movement of repeats between loci, even though they are separated by only ~1.4 Mb.
The discovery of two novel retrotransposon derived tandem repeats supports the conclusions from earlier studies that retrotransposons can give rise to tandem repeats in eukaryotic genomes. Analysis of monomers from two different CRM1TR loci shows that gene conversion is the major cause of sequence variation. We propose that successive intrastrand deletions generated the initial repeat structure, and gene conversions increased the size of each tandem repeat locus.
串联重复序列在高等真核生物基因组中普遍存在且丰富,与转座元件一起构成了着丝粒和其他异染色质区域的大部分 DNA。在玉米中,着丝粒卫星重复序列 (CentC) 和着丝粒反转录转座子 (CR),是 Ty3/gypsy 反转录转座子的一类,在着丝粒处富集。一些卫星重复序列与反转录转座子具有同源性,已经提出了几种机制来解释串联重复序列的扩张、收缩和同质化。然而,串联重复序列的起源和进化在很大程度上仍然未知。
CRM1TR 和 CRM4TR 是我们发现的新型串联重复序列,它们完全源自属于两个不同亚家族 CRM1 和 CRM4 的 CR 元件。尽管这些串联重复序列显然起源于至少两个独立的事件,但它们来自其各自亲本元素的相似区域,即长末端重复 (LTR) 和非翻译区 (UTR)。CRM1TR 和 CRM4TR 的单体重复单元的 5'端映射到其各自 LTR 内的不同位置,而它们的 3'端映射到 UTR 内的保守区域的相同相对位置。基于插入到这些串联重复序列中的异源反转录转座子的插入时间,重复序列的扩增估计至少在 4(CRM1TR)和 1(CRM4TR)百万年前开始。不同的 CRM1TR 序列变体占据两个 CRM1TR 位点,表明即使它们之间的距离仅相隔 1.4 Mb,重复序列在位点之间的移动很少或没有。
两个新型反转录转座子衍生的串联重复序列的发现支持了早期研究的结论,即反转录转座子可以在真核生物基因组中产生串联重复序列。对来自两个不同 CRM1TR 位点的单体进行分析表明,基因转换是序列变异的主要原因。我们提出,连续的链内缺失产生了初始重复结构,并且基因转换增加了每个串联重复序列位点的大小。