Kapitonov Vladimir V, Tempel Sébastien, Jurka Jerzy
Genetic Information Research Institute, 1925 Landings Dr, Mountain View, CA 94041, USA.
Gene. 2009 Dec 15;448(2):207-13. doi: 10.1016/j.gene.2009.07.019. Epub 2009 Aug 3.
Rapidly growing number of sequenced genomes requires fast and accurate computational tools for analysis of different transposable elements (TEs). In this paper we focus on a rapid and reliable procedure for classification of autonomous non-LTR retrotransposons based on alignment and clustering of their reverse transcriptase (RT) domains. Typically, the RT domain protein sequences encoded by different non-LTR retrotransposons are similar to each other in terms of significant BLASTP E-values. Therefore, they can be easily detected by the routine BLASTP searches of genomic DNA sequences coding for proteins similar to the RT domains of known non-LTR retrotransposons. However, detailed classification of non-LTR retrotransposons, i.e. their assignment to specific clades, is a slow and complex procedure that is not formalized or integrated as a standard set of computational methods and data. Here we describe a tool (RTclass1) designed for the fast and accurate automated assignment of novel non-LTR retrotransposons to known or novel clades using phylogenetic analysis of the RT domain protein sequences. RTclass1 classifies a particular non-LTR retrotransposon based on its RT domain in less than 10 min on a standard desktop computer and achieves 99.5% accuracy. RT1class1 works either as a stand-alone program installed locally or as a web-server that can be accessed distantly by uploading sequence data through the internet (http://www.girinst.org/RTphylogeny/RTclass1).
快速增长的已测序基因组数量需要快速且准确的计算工具来分析不同的转座元件(TEs)。在本文中,我们专注于一种基于自主非LTR逆转录转座子逆转录酶(RT)结构域的比对和聚类来进行分类的快速且可靠的程序。通常,不同非LTR逆转录转座子编码的RT结构域蛋白质序列在显著的BLASTP E值方面彼此相似。因此,通过对编码与已知非LTR逆转录转座子RT结构域相似蛋白质的基因组DNA序列进行常规BLASTP搜索,它们很容易被检测到。然而,非LTR逆转录转座子的详细分类,即它们被分配到特定的进化枝,是一个缓慢且复杂的过程,尚未被形式化或整合为一套标准的计算方法和数据。在这里,我们描述了一种工具(RTclass1),它使用RT结构域蛋白质序列的系统发育分析,为将新的非LTR逆转录转座子快速准确地自动分配到已知或新的进化枝而设计。RTclass1在标准台式计算机上基于其RT结构域对特定的非LTR逆转录转座子进行分类,耗时不到10分钟,准确率达到99.5%。RT1class1既可以作为本地安装的独立程序运行,也可以作为一个网络服务器运行,通过互联网上传序列数据可以远程访问(http://www.girinst.org/RTphylogeny/RTclass1)。