Gasior Stephen L, Preston Graeme, Hedges Dale J, Gilbert Nicolas, Moran John V, Deininger Prescott L
Tulane Cancer Center and Department of Epidemiology, Tulane University Health Sciences Center SL-66, 1430 Tulane Ave., New Orleans, LA 70112, United States.
Gene. 2007 Apr 1;390(1-2):190-8. doi: 10.1016/j.gene.2006.08.024. Epub 2006 Sep 12.
The human Long Interspersed Element-1 (LINE-1) and the Short Interspersed Element (SINE) Alu comprise 28% of the human genome. They share the same L1-encoded endonuclease for insertion, which recognizes an A+T-rich sequence. Under a simple model of insertion distribution, this nucleotide preference would lead to the prediction that the populations of both elements would be biased towards A+T-rich regions. Genomic L1 elements do show an A+T-rich bias. In contrast, Alu is biased towards G+C-rich regions when compared to the genome average. Several analyses have demonstrated that relatively recent insertions of both elements show less G+C content bias relative to older elements. We have analyzed the repetitive element and G+C composition of more than 100 pre-insertion loci derived from de novo L1 insertions in cultured human cancer cells, which should represent an evolutionarily unbiased set of insertions. An A+T-rich bias is observed in the 50 bp flanking the endonuclease target site, consistent with the known target site for the L1 endonuclease. The L1, Alu, and G+C content of 20 kb of the de novo pre-insertion loci shows a different set of biases than that observed for fixed L1s in the human genome. In contrast to the insertion sites of genomic L1s, the de novo L1 pre-insertion loci are relatively L1-poor, Alu-rich and G+C neutral. Finally, a statistically significant cluster of de novo L1 insertions was localized in the vicinity of the c-myc gene. These results suggest that the initial insertion preference of L1, while A+T-rich in the initial vicinity of the break site, can be influenced by the broader content of the flanking genomic region and have implications for understanding the dynamics of L1 and Alu distributions in the human genome.
人类长散在核元件1(LINE-1)和短散在核元件(SINE)Alu占人类基因组的28%。它们共享相同的由L1编码的用于插入的内切酶,该内切酶识别富含A+T的序列。在一个简单的插入分布模型下,这种核苷酸偏好会导致预测这两种元件的群体将偏向于富含A+T的区域。基因组中的L1元件确实表现出富含A+T的偏向性。相比之下,与基因组平均水平相比,Alu偏向于富含G+C的区域。多项分析表明,与较老的元件相比,这两种元件相对较新的插入显示出较少的G+C含量偏向性。我们分析了来自培养的人类癌细胞中新生L1插入的100多个插入前位点的重复元件和G+C组成,这些位点应该代表了一组在进化上无偏向性的插入。在内切酶靶位点侧翼的50 bp中观察到富含A+T的偏向性,这与L1内切酶的已知靶位点一致。新生插入前位点20 kb的L1、Alu和G+C含量显示出与人类基因组中固定L1所观察到的不同的偏向性集合。与基因组L1的插入位点不同,新生L1插入前位点相对L1含量低、Alu含量高且G+C呈中性。最后,一个具有统计学意义的新生L1插入簇定位于c-myc基因附近。这些结果表明,L1的初始插入偏好虽然在断裂位点的初始附近富含A+T,但可能会受到侧翼基因组区域更广泛内容的影响,这对于理解L1和Alu在人类基因组中的分布动态具有重要意义。