Steinbauerová Veronika, Neumann Pavel, Novák Petr, Macas Jiří
Institute of Plant Molecular Biology, Biology Centre ASCR, Branišovská 31, Ceske Budejovice, Czech Republic.
Genetica. 2011 Dec;139(11-12):1543-55. doi: 10.1007/s10709-012-9654-9. Epub 2012 Apr 29.
Long terminal repeat (LTR) retrotransposons make up substantial parts of most higher plant genomes where they accumulate due to their replicative mode of transposition. Although the transposition is facilitated by proteins encoded within the gag-pol region which is common to all autonomous elements, some LTR retrotransposons were found to potentially carry an additional protein coding capacity represented by extra open reading frames located upstream or downstream of gag-pol. In this study, we performed a comprehensive in silico survey and comparative analysis of these extra open reading frames (ORFs) in the group of Ty3/gypsy LTR retrotransposons as the first step towards our understanding of their origin and function. We found that extra ORFs occur in all three major lineages of plant Ty3/gypsy elements, being the most frequent in the Tat lineage where most (77 %) of identified elements contained extra ORFs. This lineage was also characterized by the highest diversity of extra ORF arrangement (position and orientation) within the elements. On the other hand, all of these ORFs could be classified into only two broad groups based on their mutual similarities or the presence of short conserved motifs in their inferred protein sequences. In the Athila lineage, the extra ORFs were confined to the element 3' regions but they displayed much higher sequence diversity compared to those found in Tat. In the lineage of Chromoviruses the extra ORFs were relatively rare, occurring only in 5' regions of a group of elements present in a single plant family (Poaceae). In all three lineages, most extra ORFs lacked sequence similarities to characterized gene sequences or functional protein domains, except for two Athila-like elements with similarities to LOGL4 gene and part of the Chromoviruses extra ORFs that displayed partial similarity to histone H3 gene. Thus, in these cases the extra ORFs most likely originated by transduction or recombination of cellular gene sequences. In addition, the protein domain which is otherwise associated with DNA transposons have been detected in part of the Tat-like extra ORFs, pointing to their origin from an insertion event of a mobile element.
长末端重复序列(LTR)逆转座子占大多数高等植物基因组的很大一部分,由于其复制性转座模式,它们在基因组中不断积累。尽管转座由所有自主元件共有的gag-pol区域内编码的蛋白质促进,但发现一些LTR逆转座子可能具有额外的蛋白质编码能力,由位于gag-pol上游或下游的额外开放阅读框表示。在本研究中,我们对Ty3/gypsy LTR逆转座子组中的这些额外开放阅读框(ORF)进行了全面的计算机模拟调查和比较分析,这是我们了解其起源和功能的第一步。我们发现,额外的ORF存在于植物Ty3/gypsy元件的所有三个主要谱系中,在Tat谱系中最为常见,其中大多数(77%)已鉴定元件含有额外的ORF。该谱系的特点还在于元件内额外ORF排列(位置和方向)的多样性最高。另一方面,所有这些ORF根据其相互相似性或推断蛋白质序列中短保守基序的存在,只能分为两大类。在Athila谱系中,额外的ORF局限于元件的3'区域,但与Tat谱系中发现的相比,它们表现出更高的序列多样性。在染色体病毒谱系中,额外的ORF相对较少,仅出现在单个植物科(禾本科)中一组元件的5'区域。在所有三个谱系中,除了两个与LOGL4基因相似的Athila样元件和部分与组蛋白H3基因显示部分相似性的染色体病毒额外ORF外,大多数额外的ORF与已表征的基因序列或功能蛋白质结构域缺乏序列相似性。因此,在这些情况下,额外的ORF很可能起源于细胞基因序列的转导或重组。此外,在部分Tat样额外ORF中检测到了与DNA转座子相关的蛋白质结构域,表明它们起源于一个移动元件的插入事件。