Elphinstone Cassandra, Elphinstone Rob, Todesco Marco, Rieseberg Loren H
Department of Botany, University of British Columbia, Vancouver, British Columbia, Canada.
Independent Researcher, Nanaimo, British Columbia, Canada.
Mol Ecol Resour. 2025 Mar 4:e14084. doi: 10.1111/1755-0998.14084.
Tandem repeats play an important role in centromere structure, subtelomeric regions, DNA methylation, recombination and the regulation of gene activity. Analysis of their distribution in genomes offers a potential means for predicting putative centromere locations, which continues to be a challenge for genome annotation. Here we present RepeatOBserver (https://github.com/celphin/RepeatOBserverV1), a new tool for visualising repeat patterns and identifying putative centromere locations, using a Fourier transform of DNA walks. RepeatOBserver can identify and visualise a broad range of perfect and imperfect repeats (3-5000 bp long) in genome assemblies without any a priori knowledge of repeat sequences or the need for optimising parameters. RepeatOBserver heatmaps can distinguish between tandem and retrotransposon repeats. We analysed 159 chromosomes with experimentally-verified centromere positions from 12 plant and animal species. We find that 93% of experimentally-verified tandem repeat centromeres occur in regions of low sequence diversity and 97% of retrotransposon centromeres occur in regions with a high abundance of repeat lengths. Depending on the centromere type predicted by the heatmaps, putative centromere locations can be predicted using either a genomic Shannon diversity index or a repeat abundance sum. RepeatOBserver can also locate other regions of interest including potential neocentromeres and gene copy variation. Split and inverted tandem repeats at inversion boundaries suggest that chromosomal inversions or mis-assemblies can also be located. RepeatOBserver is a flexible tool for comprehensive characterisation of repeat patterns that can be used to visualise and identify a variety of regions of interest in genome assemblies.
串联重复序列在着丝粒结构、亚端粒区域、DNA甲基化、重组以及基因活性调控中发挥着重要作用。分析它们在基因组中的分布为预测假定的着丝粒位置提供了一种潜在方法,而这仍然是基因组注释面临的一项挑战。在此,我们展示了RepeatOBserver(https://github.com/celphin/RepeatOBserverV1),这是一种利用DNA序列行走的傅里叶变换来可视化重复模式并识别假定着丝粒位置的新工具。RepeatOBserver无需任何关于重复序列的先验知识,也无需优化参数,就能识别并可视化基因组组装中广泛的完美和不完美重复序列(长度为3 - 5000 bp)。RepeatOBserver热图能够区分串联重复序列和逆转座子重复序列。我们分析了来自12种动植物物种的159条经实验验证着丝粒位置的染色体。我们发现,93%经实验验证的串联重复着丝粒出现在序列多样性较低的区域,97%的逆转座子着丝粒出现在重复长度丰度较高的区域。根据热图预测的着丝粒类型,可以使用基因组香农多样性指数或重复丰度总和来预测假定的着丝粒位置。RepeatOBserver还能够定位其他感兴趣的区域,包括潜在的新着丝粒和基因拷贝变异。倒位边界处的分裂和反向串联重复表明也能够定位染色体倒位或错误组装。RepeatOBserver是一种灵活的工具,可用于全面表征重复模式,能够用于可视化和识别基因组组装中各种感兴趣的区域。