Gurdon Institute, University of Cambridge, Cambridge CB2 1QN, UK.
Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton CB10 1SA, UK.
Nucleic Acids Res. 2022 Jun 24;50(11):e64. doi: 10.1093/nar/gkac136.
Most genomes harbor a large number of transposons, and they play an important role in evolution and gene regulation. They are also of interest to clinicians as they are involved in several diseases, including cancer and neurodegeneration. Although several methods for transposon identification are available, they are often highly specialised towards specific tasks or classes of transposons, and they lack common standards such as a unified taxonomy scheme and output file format. We present TransposonUltimate, a powerful bundle of three modules for transposon classification, annotation, and detection of transposition events. TransposonUltimate comes as a Conda package under the GPL-3.0 licence, is well documented and it is easy to install through https://github.com/DerKevinRiehl/TransposonUltimate. We benchmark the classification module on the large TransposonDB covering 891,051 sequences to demonstrate that it outperforms the currently best existing solutions. The annotation and detection modules combine sixteen existing softwares, and we illustrate its use by annotating Caenorhabditis elegans, Rhizophagus irregularis and Oryza sativa subs. japonica genomes. Finally, we use the detection module to discover 29 554 transposition events in the genomes of 20 wild type strains of C. elegans. Databases, assemblies, annotations and further findings can be downloaded from (https://doi.org/10.5281/zenodo.5518085).
大多数基因组都含有大量转座子,它们在进化和基因调控中发挥着重要作用。它们也引起了临床医生的关注,因为它们与包括癌症和神经退行性疾病在内的几种疾病有关。尽管有几种转座子鉴定方法,但它们通常高度专业化,针对特定的任务或转座子类别,并且缺乏通用标准,如统一的分类方案和输出文件格式。我们提出了 TransposonUltimate,这是一个用于转座子分类、注释和转座事件检测的三个模块的强大捆绑包。TransposonUltimate 在 GPL-3.0 许可证下作为 Conda 包提供,文档齐全,可通过 https://github.com/DerKevinRiehl/TransposonUltimate 轻松安装。我们在涵盖 891,051 个序列的大型 TransposonDB 上对分类模块进行了基准测试,以证明它优于当前最好的现有解决方案。注释和检测模块结合了十六个现有的软件,我们通过注释秀丽隐杆线虫、根肿菌不规则和水稻亚种粳稻的基因组来说明其用途。最后,我们使用检测模块在 20 个秀丽隐杆线虫野生型菌株的基因组中发现了 29554 个转座事件。数据库、组装、注释和进一步的发现可以从 (https://doi.org/10.5281/zenodo.5518085) 下载。