Department of Plant Developmental Genetics, Institute of Biophysics of the Czech Academy of Sciences, 61200 Brno, Czech Republic.
Department of Machine Learning and Data Processing, Faculty of Informatics, Masaryk University, 60200 Brno, Czech Republic.
Bioinformatics. 2020 Dec 22;36(20):4991-4999. doi: 10.1093/bioinformatics/btaa632.
Transposable elements (TEs) in eukaryotes often get inserted into one another, forming sequences that become a complex mixture of full-length elements and their fragments. The reconstruction of full-length elements and the order in which they have been inserted is important for genome and transposon evolution studies. However, the accumulation of mutations and genome rearrangements over evolutionary time makes this process error-prone and decreases the efficiency of software aiming to recover all nested full-length TEs.
We created software that uses a greedy recursive algorithm to mine increasingly fragmented copies of full-length LTR retrotransposons in assembled genomes and other sequence data. The software called TE-greedy-nester considers not only sequence similarity but also the structure of elements. This new tool was tested on a set of natural and synthetic sequences and its accuracy was compared to similar software. We found TE-greedy-nester to be superior in a number of parameters, namely computation time and full-length TE recovery in highly nested regions.
http://gitlab.fi.muni.cz/lexa/nested.
Supplementary data are available at Bioinformatics online.
真核生物中的转座元件 (TEs) 经常相互插入,形成由全长元件及其片段组成的复杂混合物。全长元件的重建及其插入顺序对于基因组和转座子进化研究非常重要。然而,随着时间的推移,突变和基因组重排的积累使得这个过程容易出错,并降低了旨在恢复所有嵌套全长 TEs 的软件的效率。
我们创建了一个软件,该软件使用贪婪递归算法来挖掘组装基因组和其他序列数据中越来越碎片化的全长 LTR 反转录转座子副本。该软件名为 TE-greedy-nester,不仅考虑了序列相似性,还考虑了元件的结构。我们在一组自然和合成序列上测试了这个新工具,并将其准确性与类似的软件进行了比较。我们发现 TE-greedy-nester 在许多参数上都具有优势,即计算时间和高度嵌套区域的全长 TE 恢复。
http://gitlab.fi.muni.cz/lexa/nested.
补充数据可在 Bioinformatics 在线获得。