Department of Computer Science; Bioinformatics and Pattern Recognition Group, Graduation Program in Bioinformatics, Federal University of Technology - Paraná (UTFPR), Cornélio Procópio, Paraná, 86300000, Brazil.
Departament of Agricultural and Environmental Biotechnology, School of Agricultural and Veterinary Sciences, São Paulo State University (UNESP), Jaboticabal, São Paulo, 14884-900, Brazil.
F1000Res. 2021 Nov 24;10:1194. doi: 10.12688/f1000research.74524.1. eCollection 2021.
Advances in genomic sequencing have recently offered vast opportunities for biological exploration, unraveling the evolution and improving our understanding of Earth biodiversity. Due to distinct plant species characteristics in terms of genome size, ploidy and heterozygosity, transposable elements (TEs) are common characteristics of many genomes. TEs are ubiquitous and dispersed repetitive DNA sequences that frequently impact the evolution and composition of the genome, mainly due to their redundancy and rearrangements. For this study, we provided an atlas of TE data by employing an easy-to-use portal ( APTE website ). To our knowledge, this is the most extensive and standardized analysis of TEs in plant genomes. We evaluated 67 plant genomes assembled at chromosome scale, recovering a total of 49,802,023 TE records, representing a total of 47,992,091,043 (~47,62%) base pairs (bp) of the total genomic space. We observed that new types of TEs were identified and annotated compared to other data repositories. By establishing a standardized catalog of TE annotation on 67 genomes, new hypotheses, exploration of TE data and their influences on the genomes may allow a better understanding of their function and processes. All original code and an example of how we developed the TE annotation strategy is available on GitHub ( ).
基因组测序的进展最近为生物探索提供了广阔的机会,揭示了地球生物多样性的进化,并提高了我们对其的理解。由于植物物种在基因组大小、倍性和杂合性方面存在明显特征,转座元件(TEs)是许多基因组的共同特征。TEs 是普遍存在和分散的重复 DNA 序列,由于其冗余性和重排,常常影响基因组的进化和组成。在这项研究中,我们通过使用易于使用的门户(APTE 网站)提供了 TEs 数据图谱。据我们所知,这是对植物基因组中 TEs 的最广泛和标准化的分析。我们评估了 67 个组装到染色体水平的植物基因组,总共恢复了 49,802,023 个 TEs 记录,代表了总基因组空间的 47,992,091,043(~47.62%)个碱基对(bp)。我们观察到与其他数据存储库相比,新类型的 TEs 被鉴定和注释。通过在 67 个基因组上建立 TE 注释的标准化目录,可以对 TE 数据及其对基因组的影响进行新的假设、探索,从而更好地理解它们的功能和过程。所有原始代码以及我们如何开发 TE 注释策略的示例都可在 GitHub()上获得。