Max Planck Institute of Molecular Cell Biology and Genetics, Pfotenhauerstr. 108, 01307 Dresden, Germany.
Max Planck Institute for the Physics of Complex Systems, Noethnitzer Str. 38, 01187 Dresden, Germany.
Gigascience. 2019 Nov 1;8(11). doi: 10.1093/gigascience/giz132.
Transposons and other repetitive sequences make up a large part of complex genomes. Repetitive sequences can be co-opted into a variety of functions and thus provide a source for evolutionary novelty. However, comprehensively detecting ancestral repeats that align between species is difficult because considering all repeat-overlapping seeds in alignment methods that rely on the seed-and-extend heuristic results in prohibitively high runtimes.
Here, we show that ignoring repeat-overlapping alignment seeds when aligning entire genomes misses numerous alignments between repetitive elements. We present a tool, RepeatFiller, that improves genome alignments by incorporating previously undetected local alignments between repetitive sequences. By applying RepeatFiller to genome alignments between human and 20 other representative mammals, we uncover between 22 and 84 Mb of previously undetected alignments that mostly overlap transposable elements. We further show that the increased alignment coverage improves the annotation of conserved non-exonic elements, both by discovering numerous novel transposon-derived elements that evolve under constraint and by removing thousands of elements that are not under constraint in placental mammals.
RepeatFiller contributes to comprehensively aligning repetitive genomic regions, which facilitates studying transposon co-option and genome evolution. Source code: https://github.com/hillerlab/GenomeAlignmentTools.
转座子和其他重复序列构成了复杂基因组的很大一部分。重复序列可以被重新用于各种功能,从而为进化创新提供了来源。然而,全面检测物种间对齐的祖先重复序列是困难的,因为在依赖于种子和扩展启发式的对齐方法中考虑所有重复重叠的种子会导致运行时间非常长。
在这里,我们表明在对齐整个基因组时忽略重复重叠的对齐种子会错过许多重复元件之间的对齐。我们提出了一个工具 RepeatFiller,它通过整合以前未检测到的重复序列之间的局部对齐来改进基因组的对齐。通过将 RepeatFiller 应用于人类和 20 个其他代表性哺乳动物之间的基因组对齐,我们发现了 22 到 84Mb 的以前未检测到的对齐,这些对齐主要与转座元件重叠。我们进一步表明,增加的对齐覆盖度提高了保守非外显子元件的注释,这不仅通过发现大量受约束进化的新转座子衍生元件,而且通过去除数千个在胎盘哺乳动物中不受约束的元件来实现。
RepeatFiller 有助于全面对齐重复的基因组区域,这有助于研究转座子的重新利用和基因组的进化。代码来源:https://github.com/hillerlab/GenomeAlignmentTools。