Institute of Agrobiotechnology, Centre for Research and Technology Hellas, Thessaloniki, 57001, Greece.
BMC Genomics. 2012 Apr 30;13:158. doi: 10.1186/1471-2164-13-158.
Sireviruses are an ancient genus of the Copia superfamily of LTR retrotransposons, and the only one that has exclusively proliferated within plant genomes. Based on experimental data and phylogenetic analyses, Sireviruses have successfully infiltrated many branches of the plant kingdom, extensively colonizing the genomes of grass species. Notably, it was recently shown that they have been a major force in the make-up and evolution of the maize genome, where they currently occupy ~21% of the nuclear content and ~90% of the Copia population. It is highly likely, therefore, that their life dynamics have been fundamental in the genome composition and organization of a plethora of plant hosts. To assist studies into their impact on plant genome evolution and also facilitate accurate identification and annotation of transposable elements in sequencing projects, we developed MASiVEdb (Mapping and Analysis of SireVirus Elements Database), a collective and systematic resource of Sireviruses in plants.
Taking advantage of the increasing availability of plant genomic sequences, and using an updated version of MASiVE, an algorithm specifically designed to identify Sireviruses based on their highly conserved genome structure, we populated MASiVEdb (http://bat.infspire.org/databases/masivedb/) with data on 16,243 intact Sireviruses (total length >158Mb) discovered in 11 fully-sequenced plant genomes. MASiVEdb is unlike any other transposable element database, providing a multitude of highly curated and detailed information on a specific genus across its hosts, such as complete set of coordinates, insertion age, and an analytical breakdown of the structure and gene complement of each element. All data are readily available through basic and advanced query interfaces, batch retrieval, and downloadable files. A purpose-built system is also offered for detecting and visualizing similarity between user sequences and Sireviruses, as well as for coding domain discovery and phylogenetic analysis.
MASiVEdb is currently the most comprehensive directory of Sireviruses, and as such complements other efforts in cataloguing plant transposable elements and elucidating their role in host genome evolution. Such insights will gradually deepen, as we plan to further improve MASiVEdb by phylogenetically mapping Sireviruses into families, by including data on fragments and solo LTRs, and by incorporating elements from newly-released genomes.
Sireviruses 是 LTR 反转录转座子 Copia 超家族的一个古老属,也是唯一一个在植物基因组中专门增殖的属。基于实验数据和系统发育分析,Sireviruses 已成功渗透到植物王国的许多分支中,广泛定植于草物种的基因组中。值得注意的是,最近的研究表明,它们在玉米基因组的组成和进化中发挥了重要作用,目前它们占据了核基因组的21%和 Copia 种群的90%。因此,它们的生活动态很可能对许多植物宿主的基因组组成和组织具有重要意义。为了帮助研究它们对植物基因组进化的影响,并促进测序项目中转座元件的准确鉴定和注释,我们开发了 MASiVEdb(SireVirus 元素数据库的映射和分析),这是一个植物中 Sireviruses 的集合和系统资源。
利用越来越多的植物基因组序列的可用性,并使用专门设计用于根据其高度保守的基因组结构识别 Sireviruses 的更新版本的 MASiVE,我们在 MASiVEdb(http://bat.infspire.org/databases/masivedb/)中填充了在 11 个完全测序的植物基因组中发现的 16243 个完整 Sireviruses(全长>158Mb)的数据。MASiVEdb 与任何其他转座元件数据库都不同,它提供了关于宿主中特定属的大量经过精心整理和详细的信息,例如完整的坐标集、插入年龄以及每个元素的结构和基因组成的分析分解。所有数据都可通过基本和高级查询接口、批量检索和可下载文件轻松获得。还提供了一个专门的系统,用于检测和可视化用户序列与 Sireviruses 之间的相似性,以及用于编码域发现和系统发育分析。
MASiVEdb 是目前最全面的 Sireviruses 目录,因此补充了其他对植物转座元件进行编目的工作,并阐明了它们在宿主基因组进化中的作用。随着我们计划通过将 Sireviruses 按谱系映射到家族中、包含片段和 solo LTR 的数据以及整合来自新发布基因组的元素来进一步改进 MASiVEdb,这种洞察力将逐渐加深。