INTEC Systems Institute Inc., Koto-ku 136-0075, Japan.
Nucleic Acids Res. 2010 Mar;38(4):1163-71. doi: 10.1093/nar/gkp1098. Epub 2009 Dec 3.
More than 40% of the human genome is generated by retrotransposition, a series of in vivo processes involving reverse transcription of RNA molecules and integration of the transcripts into the genomic sequence. The mechanism of retrotransposition, however, is not fully understood, and additional genomic elements generated by retrotransposition may remain to be discovered. Here, we report that the human genome contains many previously unidentified short pseudogenes generated by retrotransposition of mRNAs. Genomic elements generated by non-long terminal repeat retrotransposition have specific sequence signatures: a poly-A tract that is immediately downstream and a pair of duplicated sequences, called target site duplications (TSDs), at either end. Using a new computer program, TSDscan, that can accurately detect pseudogenes based on the presence of the poly-A tract and TSDs, we found 654 short (< or = 300 bp), previously unknown pseudogenes derived from mRNAs. Comprehensive analyses of the pseudogenes that we identified and their parent mRNAs revealed that the pseudogene length depends on the parent mRNA length: long mRNAs generate more short pseudogenes than do short mRNAs. To explain this phenomenon, we hypothesize that most long mRNAs are truncated before they are reverse transcribed. Truncated mRNAs would be rapidly degraded during reverse transcription, resulting in the generation of short pseudogenes.
人类基因组的 40%以上是由逆转录转座产生的,这是一系列涉及 RNA 分子反转录和转录本整合到基因组序列中的体内过程。然而,逆转录转座的机制尚不完全清楚,可能还有其他由逆转录转座产生的基因组元件有待发现。在这里,我们报告人类基因组中含有许多以前未被识别的由 mRNA 逆转录转座产生的短假基因。非长末端重复逆转录转座产生的基因组元件具有特定的序列特征:在 poly-A 区下游有一个 poly-A 区,在 poly-A 区的两端有一对重复序列,称为靶序列重复(target site duplication,TSD)。我们使用一种新的计算机程序 TSDscan,它可以根据 poly-A 区和 TSD 的存在准确检测假基因,发现了 654 个来自 mRNA 的短(<=300bp)、以前未知的假基因。对我们鉴定的假基因及其母 mRNA 的综合分析表明,假基因的长度取决于母 mRNA 的长度:长 mRNA 产生的短假基因比短 mRNA 多。为了解释这一现象,我们假设大多数长 mRNA 在被反转录之前就被截断了。在反转录过程中,截断的 mRNA 会迅速降解,从而产生短的假基因。