Department of Biology, University of Rochester, Rochester, NY, USA.
Mol Ecol Resour. 2021 Apr;21(3):969-981. doi: 10.1111/1755-0998.13305. Epub 2021 Jan 4.
Study of repetitive DNA elements in model organisms highlights the role of repetitive elements (REs) in many processes that drive genome evolution and phenotypic change. Because REs are much more dynamic than single-copy DNA, repetitive sequences can reveal signals of evolutionary history over short time scales that may not be evident in sequences from slower-evolving genomic regions. Many tools for studying REs are directed toward organisms with existing genomic resources, including genome assemblies and repeat libraries. However, signals in repeat variation may prove especially valuable in disentangling evolutionary histories in diverse non-model groups, for which genomic resources are limited. Here, we introduce RepeatProfiler, a tool for generating, visualizing, and comparing repetitive element DNA profiles from low-coverage, short-read sequence data. RepeatProfiler automates the generation and visualization of RE coverage depth profiles (RE profiles) and allows for statistical comparison of profile shape across samples. In addition, RepeatProfiler facilitates comparison of profiles by extracting signal from sequence variants across profiles which can then be analysed as molecular morphological characters using phylogenetic analysis. We validate RepeatProfiler with data sets from ground beetles (Bembidion), flies (Drosophila), and tomatoes (Solanum). We highlight the potential of RE profiles as a high-resolution data source for studies in species delimitation, comparative genomics, and repeat biology.
对模式生物中重复 DNA 元件的研究强调了重复元件 (REs) 在许多驱动基因组进化和表型变化的过程中的作用。由于重复序列比单拷贝 DNA 更具动态性,因此重复序列可以揭示短时间尺度上的进化历史信号,而这些信号在进化较慢的基因组区域的序列中可能不明显。许多用于研究重复序列的工具都针对具有现有基因组资源的生物,包括基因组组装和重复序列库。然而,重复变异中的信号在解析不同非模式群体的进化历史方面可能特别有价值,因为这些群体的基因组资源有限。在这里,我们介绍了 RepeatProfiler,这是一种从低覆盖度、短读序列数据生成、可视化和比较重复元件 DNA 图谱的工具。RepeatProfiler 自动生成和可视化重复元件覆盖深度图谱 (RE 图谱),并允许对样本之间的图谱形状进行统计比较。此外,RepeatProfiler 通过从跨图谱的序列变体中提取信号来促进图谱之间的比较,然后可以使用系统发育分析将这些信号作为分子形态特征进行分析。我们使用来自步甲 (Bembidion)、果蝇 (Drosophila) 和番茄 (Solanum) 的数据集验证了 RepeatProfiler。我们强调了 RE 图谱作为物种界定、比较基因组学和重复生物学研究的高分辨率数据源的潜力。