Kirov Ilya, Kolganova Elizaveta, Dudnikov Maxim, Yurkevich Olga Yu, Amosova Alexandra V, Muravenko Olga V
All-Russia Research Institute of Agricultural Biotechnology, Timiryazevskaya Str. 42, Moscow 127550, Russia.
Moscow Institute of Physics and Technology, Dolgoprudny 141701, Russia.
Plants (Basel). 2022 Aug 12;11(16):2103. doi: 10.3390/plants11162103.
High-copy tandemly organized repeats (TRs), or satellite DNA, is an important but still enigmatic component of eukaryotic genomes. TRs comprise arrays of multi-copy and highly similar tandem repeats, which makes the elucidation of TRs a very challenging task. Oxford Nanopore sequencing data provide a valuable source of information on TR organization at the single molecule level. However, bioinformatics tools for de novo identification of TRs in raw Nanopore data have not been reported so far. We developed NanoTRF, a new python pipeline for TR repeat identification, characterization and consensus monomer sequence assembly. This new pipeline requires only a raw Nanopore read file from low-depth (<1×) genome sequencing. The program generates an informative html report and figures on TR genome abundance, monomer sequence and monomer length. In addition, NanoTRF performs annotation of transposable elements (TEs) sequences within or near satDNA arrays, and the information can be used to elucidate how TR−TE co-evolve in the genome. Moreover, we validated by FISH that the NanoTRF report is useful for the evaluation of TR chromosome organization—clustered or dispersed. Our findings showed that NanoTRF is a robust method for the de novo identification of satellite repeats in raw Nanopore data without prior read assembly. The obtained sequences can be used in many downstream analyses including genome assembly assistance and gap estimation, chromosome mapping and cytogenetic marker development.
高拷贝串联重复序列(TRs),即卫星DNA,是真核生物基因组的一个重要但仍神秘的组成部分。TRs由多拷贝且高度相似的串联重复序列阵列组成,这使得阐明TRs成为一项极具挑战性的任务。牛津纳米孔测序数据提供了关于单分子水平TR组织的宝贵信息来源。然而,目前尚未报道用于从原始纳米孔数据中从头识别TRs的生物信息学工具。我们开发了NanoTRF,这是一种用于TR重复序列识别、表征和共有单体序列组装的新Python管道。这个新管道只需要来自低深度(<1×)基因组测序的原始纳米孔读取文件。该程序生成一份关于TR基因组丰度、单体序列和单体长度的信息丰富的html报告和图表。此外,NanoTRF对satDNA阵列内或附近的转座元件(TEs)序列进行注释,这些信息可用于阐明TR与TE在基因组中是如何共同进化的。此外,我们通过荧光原位杂交验证了NanoTRF报告对于评估TR染色体组织(聚集或分散)是有用的。我们的研究结果表明,NanoTRF是一种无需预先进行读取组装即可从原始纳米孔数据中从头识别卫星重复序列的强大方法。获得的序列可用于许多下游分析,包括基因组组装辅助和缺口估计、染色体定位以及细胞遗传学标记开发。