Benachenhou Farid, Jern Patric, Oja Merja, Sperber Göran, Blikstad Vidar, Somervuo Panu, Kaski Samuel, Blomberg Jonas
Department of Medical Sciences, Section of Virology, Uppsala University, Uppsala, Sweden.
PLoS One. 2009;4(4):e5179. doi: 10.1371/journal.pone.0005179. Epub 2009 Apr 13.
Retroviral LTRs, paired or single, influence the transcription of both retroviral and non-retroviral genomic sequences. Vertebrate genomes contain many thousand endogenous retroviruses (ERVs) and their LTRs. Single LTRs are difficult to detect from genomic sequences without recourse to repetitiveness or presence in a proviral structure. Understanding of LTR structure increases understanding of LTR function, and of functional genomics. Here we develop models of orthoretroviral LTRs useful for detection in genomes and for structural analysis.
Although mutated, ERV LTRs are more numerous and diverse than exogenous retroviral (XRV) LTRs. Hidden Markov models (HMMs), and alignments based on them, were created for HML- (human MMTV-like), general-beta-, gamma- and lentiretroviruslike LTRs, plus a general-vertebrate LTR model. Training sets were XRV LTRs and RepBase LTR consensuses. The HML HMM was most sensitive and detected 87% of the HML LTRs in human chromosome 19 at 96% specificity. By combining all HMMs with a low cutoff, for screening, 71% of all LTRs found by RepeatMasker in chromosome 19 were found. HMM consensus sequences had a conserved modular LTR structure. Target site duplications (TG-CA), TATA (occasionally absent), an AATAAA box and a T-rich region were prominent features. Most of the conservation was located in, or adjacent to, R and U5, with evidence for stem loops. Several of the long HML LTRs contained long ORFs inserted after the second A rich module. HMM consensus alignment allowed comparison of functional features like transcriptional start sites (sense and antisense) between XRVs and ERVs.
The modular conserved and redundant orthoretroviral LTR structure with three A-rich regions is reminiscent of structurally relaxed Giardia promoters. The five HMMs provided a novel broad range, repeat-independent, ab initio LTR detection, with prospects for greater generalisation, and insight into LTR structure, which may aid development of LTR-targeted pharmaceuticals.
成对或单个的逆转录病毒长末端重复序列(LTRs)会影响逆转录病毒和非逆转录病毒基因组序列的转录。脊椎动物基因组包含数千种内源性逆转录病毒(ERVs)及其LTRs。如果不借助重复性或原病毒结构中的存在情况,很难从基因组序列中检测到单个LTRs。对LTR结构的理解有助于增进对LTR功能以及功能基因组学的理解。在此,我们开发了用于在基因组中检测和进行结构分析的正逆转录病毒LTRs模型。
尽管发生了突变,但ERV LTRs比外源性逆转录病毒(XRV)LTRs数量更多且种类更丰富。针对人MMTV样(HML-)、通用β、γ和慢病毒样LTRs,以及一个通用脊椎动物LTR模型,创建了隐马尔可夫模型(HMMs)及其基于此的比对。训练集为XRV LTRs和RepBase LTR共有序列。HML HMM最为灵敏,在人类19号染色体中以96%的特异性检测到了87%的HML LTRs。通过将所有HMMs与低阈值相结合进行筛选,发现了RepeatMasker在19号染色体中找到的所有LTRs中的71%。HMM共有序列具有保守的模块化LTR结构。靶位点重复序列(TG-CA)、TATA(偶尔缺失)、一个AATAAA框和一个富含T的区域是突出特征。大部分保守序列位于R和U5内或其附近,有茎环结构的证据。一些长的HML LTRs在第二个富含A的模块之后包含插入的长开放阅读框(ORFs)。HMM共有序列比对允许比较XRV和ERV之间的功能特征,如转录起始位点(正义链和反义链)。
具有三个富含A区域的模块化保守且冗余的正逆转录病毒LTR结构让人联想到结构较为宽松的贾第虫启动子。这五个HMMs提供了一种新颖的、广泛适用的、不依赖重复序列的从头LTR检测方法,具有更强的通用性前景,并能深入了解LTR结构,这可能有助于开发针对LTR的药物。