School of Computer Science and Engineering, Central South University, Changsha, 410083, China.
Xiangjiang Laboratory, Changsha, 410205, China.
Nat Commun. 2024 Jul 2;15(1):5573. doi: 10.1038/s41467-024-49912-8.
Recent advancements in genome assembly have greatly improved the prospects for comprehensive annotation of Transposable Elements (TEs). However, existing methods for TE annotation using genome assemblies suffer from limited accuracy and robustness, requiring extensive manual editing. In addition, the currently available gold-standard TE databases are not comprehensive, even for extensively studied species, highlighting the critical need for an automated TE detection method to supplement existing repositories. In this study, we introduce HiTE, a fast and accurate dynamic boundary adjustment approach designed to detect full-length TEs. The experimental results demonstrate that HiTE outperforms RepeatModeler2, the state-of-the-art tool, across various species. Furthermore, HiTE has identified numerous novel transposons with well-defined structures containing protein-coding domains, some of which are directly inserted within crucial genes, leading to direct alterations in gene expression. A Nextflow version of HiTE is also available, with enhanced parallelism, reproducibility, and portability.
近年来,基因组组装技术的进步极大地提高了转座元件(TEs)全面注释的前景。然而,现有的基于基因组组装的 TE 注释方法存在准确性和稳健性有限的问题,需要进行大量的手动编辑。此外,目前可用的 TE 金标准数据库并不全面,即使对于广泛研究的物种也是如此,这突出表明需要一种自动化的 TE 检测方法来补充现有的存储库。在这项研究中,我们引入了 HiTE,这是一种快速准确的动态边界调整方法,旨在检测全长 TEs。实验结果表明,HiTE 在各种物种中的表现均优于最先进的工具 RepeatModeler2。此外,HiTE 还鉴定了许多具有明确结构的新型转座子,这些转座子包含编码蛋白质的结构域,其中一些直接插入关键基因中,导致基因表达的直接改变。HiTE 的 Nextflow 版本也具有增强的并行性、可重复性和可移植性。