Blusch J H, Haltmeier M, Frech K, Sander I, Leib-Mösch C, Brack-Werner R, Werner T
GSF-National Research Center for Environment and Health, Institute of Mammalian Genetics, Neuherberg, Germany.
Genomics. 1997 Jul 1;43(1):52-61. doi: 10.1006/geno.1997.4790.
The current genome sequencing projects reveal megabases of unknown genomic sequences. About 1% of these sequences can be expected to be of retroviral origin. These are often severely deleted or mutated. Therefore, identification of the retroviral origin of these sequences can be very difficult due to the absence of convincing overall sequence similarity. There are also many copies of solo-LTRs (long terminal repeats) distributed throughout genomic sequences. LTR and envelope sequences in general are among the most divergent parts of the retroviral genome and thus especially hard to detect in mutated endogenous sequences. We took advantage of the fact that these retroviral sections contain short highly conserved sequence regions providing retroviral hallmarks even after loss of overall similarity. We defined several sequence elements and peptide motifs within LTR and Env sequences and used these elements to construct models for LTRs and Env proteins of mammalian C-type retroviruses. We then used this strategy to identify successfully the hitherto missing LTRs and an env-like region in the S71 human retroviral sequence. Our approach provides a new strategy for identifying remotely related retroviral sequences in genomic DNA (especially human DNA), of potential significance for the interpretation of genomic sequences obtained from the current large-scale sequencing projects.
当前的基因组测序项目揭示了数百万碱基的未知基因组序列。预计这些序列中约1%源自逆转录病毒。这些序列往往严重缺失或发生突变。因此,由于缺乏令人信服的整体序列相似性,鉴定这些序列的逆转录病毒起源可能非常困难。在整个基因组序列中还分布着许多单独的长末端重复序列(solo-LTRs)。一般来说,长末端重复序列(LTR)和包膜序列是逆转录病毒基因组中差异最大的部分,因此在突变的内源性序列中尤其难以检测到。我们利用了这样一个事实,即这些逆转录病毒片段包含短的高度保守序列区域,即使在整体相似性丧失后仍能提供逆转录病毒的特征。我们在LTR和Env序列中定义了几个序列元件和肽基序,并利用这些元件构建了哺乳动物C型逆转录病毒LTR和Env蛋白的模型。然后,我们使用这种策略成功鉴定出了S71人类逆转录病毒序列中迄今缺失的LTR和一个类似env的区域。我们的方法为鉴定基因组DNA(尤其是人类DNA)中远距离相关的逆转录病毒序列提供了一种新策略,这对于解释从当前大规模测序项目中获得的基因组序列具有潜在意义。