Titievsky Avi, Putintseva Yuliya A, Taranenko Elizaveta A, Baskin Sofya, Oreshkova Natalia V, Brodsky Elia, Sharova Alexandra V, Sharov Vadim V, Panov Julia, Kuzmin Dmitry A, Brodsky Leonid, Krutovsky Konstantin V
Tauber Bioinformatics Research Center, University of Haifa, Haifa 3498838, Israel.
Laboratory of Forest Genomics, Genome Research and Education Center, Institute of Fundamental Biology and Biotechnology, Siberian Federal University, 660036 Krasnoyarsk, Russia.
Life (Basel). 2021 Nov 15;11(11):1234. doi: 10.3390/life11111234.
Repetitive elements (RE) and transposons (TE) can comprise up to 80% of some plant genomes and may be essential for regulating their evolution and adaptation. The "repeatome" information is often unavailable in assembled genomes because genomic areas of repeats are challenging to assemble and are often missing from final assembly. However, raw genomic sequencing data contain rich information about RE/TEs. Here, raw genomic NGS reads of 10 gymnosperm species were studied for the content and abundance patterns of their "repeatome". We utilized a combination of alignment on databases of repetitive elements and de novo assembly of highly repetitive sequences from genomic sequencing reads to characterize and calculate the abundance of known and putative repetitive elements in the genomes of 10 conifer plants: , , , , , , , , , and . We found that genome abundances of known and newly discovered putative repeats are specific to phylogenetically close groups of species and match biological taxa. The grouping of species based on abundances of known repeats closely matches the grouping based on abundances of newly discovered putative repeats () and matches the known taxonomic relations.
重复元件(RE)和转座子(TE)在某些植物基因组中所占比例可达80%,可能对调控其进化和适应性至关重要。在组装好的基因组中,“重复序列组”信息往往无法获取,因为重复序列的基因组区域难以组装,且在最终组装中常常缺失。然而,原始基因组测序数据包含有关RE/TE的丰富信息。在此,我们研究了10种裸子植物的原始基因组NGS读数,以了解其“重复序列组”的含量和丰度模式。我们结合了在重复元件数据库上的比对以及从基因组测序读数中对高度重复序列的从头组装,来表征和计算10种针叶树植物基因组中已知和假定重复元件的丰度: 、 、 、 、 、 、 、 、 和 。我们发现,已知和新发现的假定重复序列的基因组丰度对于系统发育关系相近的物种组具有特异性,并且与生物分类群相匹配。基于已知重复序列丰度的物种分组与基于新发现的假定重复序列丰度的分组( )紧密匹配,并且与已知的分类关系相匹配。