Wang Jianxin, Song Lei, Gonder M Katherine, Azrak Sami, Ray David A, Batzer Mark A, Tishkoff Sarah A, Liang Ping
Department of Cancer Genetics, Roswell Park Cancer Institute, Elm and Carlton Streets, Buffalo, NY 14263, USA.
Gene. 2006 Jan 3;365:11-20. doi: 10.1016/j.gene.2005.09.031. Epub 2006 Jan 10.
Alu elements are the most active and predominant type of short interspersed elements (SINEs) in the human genome. Recently inserted polymorphic (for presence/absence) Alu elements contribute to genome diversity among different human populations, and they are useful genetic markers for population genetic studies. The objective of this study is to identify polymorphic Alu insertions through an in silico comparative genomics approach and to analyze their distribution pattern throughout the human genome. By computationally comparing the public and Celera sequence assemblies of the human genome, we identified a total of 800 polymorphic Alu elements. We used polymerase chain reaction-based assays to screen a randomly selected set of 16 of these 800 Alu insertion polymorphisms using a human diversity panel to demonstrate the efficiency of our approach. Based on sequence analysis of the 800 Alu polymorphisms, we report three new Alu subfamilies, Ya3, Ya4b, and Yb11, with Yb11 being the smallest known Alu subfamily. Analysis of retrotransposition activity revealed Yb11, Ya8, Ya5, Yb9, and Yb8 as the most active Alu subfamilies and the maintenance of a very low level of retrotransposition activity or recent gene conversion events involving S subfamilies. The 800 polymorphic Alu insertions are characterized by the presence of target site duplications (TSDs) and longer than average polyA-tail length. Their pre-integration sites largely follow an extended "NT-AARA" motif. Among chromosomes, the density of Alu insertion polymorphisms is positively correlated with the Alu-site availability and is inversely correlated with the densities of older Alu elements and genes.
Alu元件是人类基因组中最活跃、最主要的短散在元件(SINEs)类型。最近插入的多态性(存在/缺失)Alu元件有助于不同人类群体间的基因组多样性,并且它们是群体遗传学研究中有用的遗传标记。本研究的目的是通过计算机比较基因组学方法鉴定多态性Alu插入,并分析它们在整个人类基因组中的分布模式。通过对人类基因组的公共序列和Celera序列组装进行计算机比较,我们共鉴定出800个多态性Alu元件。我们使用基于聚合酶链反应的检测方法,利用人类多样性样本对这800个Alu插入多态性中的16个进行随机筛选,以证明我们方法的有效性。基于对这800个Alu多态性的序列分析,我们报告了三个新的Alu亚家族,即Ya3、Ya4b和Yb11,其中Yb11是已知最小的Alu亚家族。逆转录转座活性分析显示,Yb11、Ya8、Ya5、Yb9和Yb8是最活跃的Alu亚家族,并且S亚家族的逆转录转座活性维持在非常低的水平或存在近期的基因转换事件。这800个多态性Alu插入的特征是存在靶位点重复(TSD)且多聚腺苷酸尾长度长于平均水平。它们的整合前位点在很大程度上遵循扩展的“NT - AARA”基序。在染色体中,Alu插入多态性的密度与Alu位点可用性呈正相关,与较老的Alu元件和基因的密度呈负相关。