National Center for Scientific Research "Demokritos," Institute of Biology, 153 10 Athens, Greece.
Gene. 2012 May 10;499(1):88-98. doi: 10.1016/j.gene.2012.02.005. Epub 2012 Feb 18.
Repetitive DNA sequences derived from transposable elements (TE) are distributed in a non-random way, co-clustering with other classes of repeat elements, genes and other genomic components. In a previous work we reported power-law-like size distributions (linearity in log-log scale) in the spatial arrangement of Alu and LINE1 elements in the human genome. Here we investigate the large-scale features of the spatial arrangement of all principal classes of TEs in 14 genomes from phylogenetically distant organisms by studying the size distribution of inter-repeat distances. Power-law-like size distributions are found to be widespread, extending up to several orders of magnitude. In order to understand the emergence of this distributional pattern, we introduce an evolutionary scenario, which includes (i) Insertions of DNA segments (e.g., more recent repeats) into the considered sequence and (ii) Eliminations of members of the studied TE family. In the proposed model we also incorporate the potential for transposition events (characteristic of the DNA transposons' life-cycle) and segmental duplications. Simulations reproduce the main features of the observed size distributions. Furthermore, we investigate the effects of various genomic features on the presence and extent of power-law size distributions including TE class and age, mode of parental TE transmission, GC content, deletion and recombination rates in the studied genomic region, etc. Our observations corroborate the hypothesis that insertions of genomic material and eliminations of repeats are at the basis of power-laws in inter-repeat distances. The existence of these power-laws could facilitate the formation of the recently proposed "fractal globule" for the confined chromatin organization.
转座元件 (TE) 衍生的重复 DNA 序列以非随机的方式分布,与其他类别的重复元件、基因和其他基因组成分聚类。在之前的工作中,我们报告了人类基因组中 Alu 和 LINE1 元件的空间排列呈现类幂律分布(对数-对数尺度上的线性)。在这里,我们通过研究重复间距离的大小分布,研究了来自进化上遥远的生物体的 14 个基因组中所有主要 TE 类别的空间排列的大规模特征。发现类幂律分布广泛存在,延伸到几个数量级。为了理解这种分布模式的出现,我们引入了一个进化情景,其中包括 (i) 将 DNA 片段(例如,最近的重复)插入到考虑的序列中,以及 (ii) 消除研究 TE 家族的成员。在所提出的模型中,我们还结合了转位事件的可能性(这是 DNA 转座子生命周期的特征)和片段重复。模拟再现了观察到的大小分布的主要特征。此外,我们研究了各种基因组特征对幂律大小分布的存在和程度的影响,包括 TE 类和年龄、亲本 TE 传递模式、GC 含量、研究基因组区域中的缺失和重组率等。我们的观察结果证实了这样一种假设,即基因组物质的插入和重复的消除是重复间距离中幂律的基础。这些幂律的存在可能有助于形成最近提出的用于受限染色质组织的“分形球体”。