He Hao, Shen Fei, Hou Yong, Yang Xiaozeng
College of Life Sciences, University of Chinese Academy of Sciences, Beijing 100049, China.
Institute of Biotechnology, Beijing Academy of Agriculture and Forestry Sciences, Beijing 100097, China.
Bioinform Adv. 2025 Jul 4;5(1):vbaf162. doi: 10.1093/bioadv/vbaf162. eCollection 2025.
Long Terminal Repeat Retrotransposons (LTR-RTs) comprise a significant portion of repetitive sequences in numerous plant species. LTR-RTs hold considerable functional significance, as they can impact gene family functionality and contribute to the formation of new genes. Investigating the quantities and activities of LTR-RTs is essential for understanding species' evolutionary dynamics and the foundational mechanisms driving genome evolution. While current softwares can predict and initially classify LTR-RTs, there is a high need for more comprehensive and efficient software to fully characterize and quantify LTR-RTs during burst events and in subsequent detailed classification and quantification, especially given the surged demands of genome annotation.
In this study, we have developed a pipeline called Volcano to accurately classify LTR-RTs and characterize burst families in plants. To distinguish different clades of LTR-RTs, we have implemented an improved depth-first search algorithm. Volcano can also quantify LTR-RT expression using RNA-seq data. By analyzing LTR-RTs in three genomes from the Asteraceae family, we observed that larger genomes tend to contain a greater number of LTR-RTs, and our software effectively categorizes them at the clade level.
The proposed Volcano compressor can be downloaded from https://github.com/Suosihe/volcano_LTR.
长末端重复逆转座子(LTR-RTs)在众多植物物种的重复序列中占很大一部分。LTR-RTs具有相当重要的功能意义,因为它们会影响基因家族功能并有助于新基因的形成。研究LTR-RTs的数量和活性对于理解物种的进化动态以及驱动基因组进化的基础机制至关重要。虽然目前的软件可以预测并初步分类LTR-RTs,但迫切需要更全面、高效的软件来在爆发事件期间以及随后的详细分类和定量过程中全面表征和量化LTR-RTs,特别是考虑到基因组注释需求的激增。
在本研究中,我们开发了一种名为Volcano的流程,用于准确分类植物中的LTR-RTs并表征爆发家族。为了区分LTR-RTs的不同进化枝,我们实施了一种改进的深度优先搜索算法。Volcano还可以使用RNA-seq数据量化LTR-RT表达。通过分析菊科三个基因组中的LTR-RTs,我们观察到较大的基因组往往含有更多的LTR-RTs,并且我们的软件能够在进化枝水平上有效地对它们进行分类。
所提出的Volcano压缩器可从https://github.com/Suosihe/volcano_LTR下载。