Comeaux Matthew S, Roy-Engel Astrid M, Hedges Dale J, Deininger Prescott L
Tulane Cancer Center and Dept. of Epidemiology, Tulane University Health Sciences Center, New Orleans, Louisiana 70112, USA.
Genome Res. 2009 Apr;19(4):545-55. doi: 10.1101/gr.089789.108. Epub 2009 Mar 9.
The human genome contains nearly 1.1 million Alu elements comprising roughly 11% of its total DNA content. Alu elements use a copy and paste retrotransposition mechanism that can result in de novo disease insertion alleles. There are nearly 900,000 old Alu elements from subfamilies S and J that appear to be almost completely inactive, and about 200,000 from subfamily Y or younger, which include a few thousand copies of the Ya5 subfamily which makes up the majority of current activity. Given the much higher copy number of the older Alu subfamilies, it is not known why all of the active Alu elements belong to the younger subfamilies. We present a systematic analysis evaluating the observed sequence variation in the different sections of an Alu element on retrotransposition. The length of the longest number of uninterrupted adenines in the A-tail, the degree of A-tail heterogeneity, the length of the 3' unique end after the A-tail and before the RNA polymerase III terminator, and random mutations found in the right monomer all modulate the retrotransposition efficiency. These changes occur over different evolutionary time frames. The combined impact of sequence changes in all of these regions explains why young Alus are currently causing disease through retrotransposition, and the old Alus have lost their ability to retrotranspose. We present a predictive model to evaluate the retrotransposition capability of individual Alu elements and successfully applied it to identify the first putative source element for a disease-causing Alu insertion in a patient with cystic fibrosis.
人类基因组包含近110万个Alu元件,约占其总DNA含量的11%。Alu元件采用复制粘贴逆转录转座机制,这可能导致从头产生致病插入等位基因。有近90万个来自S和J亚家族的古老Alu元件似乎几乎完全失活,约20万个来自Y亚家族或更年轻的亚家族,其中包括几千个Ya5亚家族的拷贝,这些拷贝构成了当前大部分的活性。鉴于较古老的Alu亚家族拷贝数要高得多,目前尚不清楚为何所有活跃的Alu元件都属于较年轻的亚家族。我们进行了一项系统分析,评估了Alu元件不同区域在逆转录转座时观察到的序列变异。A尾中最长不间断腺嘌呤的长度、A尾异质性程度、A尾后和RNA聚合酶III终止子前3'独特末端的长度,以及在右侧单体中发现的随机突变,都会调节逆转录转座效率。这些变化发生在不同的进化时间框架内。所有这些区域序列变化的综合影响解释了为何年轻的Alu元件目前通过逆转录转座导致疾病,而古老的Alu元件已失去逆转录转座的能力。我们提出了一个预测模型来评估单个Alu元件的逆转录转座能力,并成功将其应用于识别一名囊性纤维化患者中一个致病Alu插入的首个推定源元件。