Department of Bioinformatics and Genomics, University of North Carolina at Charlotte, Charlotte, NC, 28223, USA.
North Carolina Research Campus, Kannapolis, NC, 28081, USA.
BMC Bioinformatics. 2023 Aug 22;24(1):317. doi: 10.1186/s12859-023-05419-5.
Transposable elements (TEs) are short, mobile DNA elements that are known to play important roles in the genomes of many eukaryotic species. The identification and categorization of these elements is a critical task for many genomic studies, and the continued increase in the number of de novo assembled genomes demands new tools to improve the efficiency of this process. For this reason, we developed RepBox, a suite of Python scripts that combine several pre-existing family-specific TE detection methods into a single user-friendly pipeline.
Based on comparisons of RepBox with the standard TE detection software RepeatModeler, we find that RepBox consistently classifies more elements and is also able to identify a more diverse array of TE families than the existing methods in plant genomes.
The performance of RepBox on two different plant genomes indicates that our toolbox represents a significant improvement over existing TE detection methods, and should facilitate future TE annotation efforts in additional species.
转座元件(TEs)是短的、可移动的 DNA 元件,已知在许多真核生物物种的基因组中发挥重要作用。这些元件的识别和分类是许多基因组研究的关键任务,并且从头组装基因组的数量不断增加,这需要新的工具来提高这个过程的效率。出于这个原因,我们开发了 RepBox,这是一套 Python 脚本,将几种现有的特定于家族的 TE 检测方法组合成一个单一的用户友好的流水线。
基于 RepBox 与标准 TE 检测软件 RepeatModeler 的比较,我们发现 RepBox 始终能够分类更多的元件,并且能够识别出比植物基因组中现有的方法更多样化的 TE 家族。
RepBox 在两个不同的植物基因组上的表现表明,我们的工具包代表了对现有 TE 检测方法的重大改进,并且应该有助于未来在其他物种中的 TE 注释工作。