Department of Genetics, Human Genetics Institute of New Jersey, Rutgers, The State University of New Jersey, Piscataway, NJ, USA.
Nucleic Acids Res. 2020 Jan 10;48(1):290-303. doi: 10.1093/nar/gkz1080.
Illumina sequencing has allowed for population-level surveys of transposable element (TE) polymorphism via split alignment approaches, which has provided important insight into the population dynamics of TEs. However, such approaches are not able to identify insertions of uncharacterized TEs, nor can they assemble the full sequence of inserted elements. Here, we use nanopore sequencing and Hi-C scaffolding to produce de novo genome assemblies for two wild strains of Drosophila melanogaster from the Drosophila Genetic Reference Panel (DGRP). Ovarian piRNA populations and Illumina split-read TE insertion profiles have been previously produced for both strains. We find that nanopore sequencing with Hi-C scaffolding produces highly contiguous, chromosome-length scaffolds, and we identify hundreds of TE insertions that were missed by Illumina-based methods, including a novel micropia-like element that has recently invaded the DGRP population. We also find hundreds of piRNA-producing loci that are specific to each strain. Some of these loci are created by strain-specific TE insertions, while others appear to be epigenetically controlled. Our results suggest that Illumina approaches reveal only a portion of the repetitive sequence landscape of eukaryotic genomes and that population-level resequencing using long reads is likely to provide novel insight into the evolutionary dynamics of repetitive elements.
Illumina 测序通过拆分比对方法实现了转座元件 (TE) 多态性的群体水平调查,这为 TE 的群体动态提供了重要的见解。然而,这种方法无法识别未被描述的 TE 的插入,也无法组装插入元件的完整序列。在这里,我们使用纳米孔测序和 Hi-C 支架来为来自 Drosophila Genetic Reference Panel (DGRP) 的两个野生品系的 Drosophila melanogaster 产生从头基因组组装。这两个品系之前已经产生了卵巢 piRNA 群体和基于 Illumina 的拆分读 TE 插入谱。我们发现,带有 Hi-C 支架的纳米孔测序产生了高度连续的染色体长度支架,并且我们鉴定出了数百个 Illumina 方法错过的 TE 插入,包括最近入侵 DGRP 群体的新型 micropia 样元件。我们还发现了数百个特定于每个品系的 piRNA 产生基因座。这些基因座中的一些是由品系特异性 TE 插入产生的,而其他则似乎受到表观遗传控制。我们的结果表明,Illumina 方法仅揭示了真核基因组重复序列景观的一部分,并且使用长读进行群体水平重测序可能会为重复元件的进化动态提供新的见解。