Solares Edwin A, Chakraborty Mahul, Miller Danny E, Kalsow Shannon, Hall Kate, Perera Anoja G, Emerson J J, Hawley R Scott
Department of Ecology and Evolutionary Biology, University of California Irvine, CA.
Stowers Institute for Medical Research, Kansas City, MO.
G3 (Bethesda). 2018 Oct 3;8(10):3143-3154. doi: 10.1534/g3.118.200162.
Accurate and comprehensive characterization of genetic variation is essential for deciphering the genetic basis of diseases and other phenotypes. A vast amount of genetic variation stems from large-scale sequence changes arising from the duplication, deletion, inversion, and translocation of sequences. In the past 10 years, high-throughput short reads have greatly expanded our ability to assay sequence variation due to single nucleotide polymorphisms. However, a recent assembly of a second reference genome has revealed that short read genotyping methods miss hundreds of structural variants, including those affecting phenotypes. While genomes assembled using high-coverage long reads can achieve high levels of contiguity and completeness, concerns about cost, errors, and low yield have limited widespread adoption of such sequencing approaches. Here we resequenced the reference strain of (ISO1) on a single Oxford Nanopore MinION flow cell run for 24 hr. Using only reads longer than 1 kb or with at least 30x coverage, we assembled a highly contiguous genome. The addition of inexpensive paired reads and subsequent scaffolding using an optical map technology achieved an assembly with completeness and contiguity comparable to the reference assembly. Comparison of our assembly to the reference assembly of ISO1 uncovered a number of structural variants (SVs), including novel LTR transposable element insertions and duplications affecting genes with developmental, behavioral, and metabolic functions. Collectively, these SVs provide a snapshot of the dynamics of genome evolution. Furthermore, our assembly and comparison to the reference genome demonstrates that high-quality assembly of reference genomes and comprehensive variant discovery using such assemblies are now possible by a single lab for under $1,000 (USD).
准确而全面地表征遗传变异对于解读疾病和其他表型的遗传基础至关重要。大量的遗传变异源于序列的复制、缺失、倒位和易位所产生的大规模序列变化。在过去十年中,高通量短读长极大地扩展了我们检测单核苷酸多态性导致的序列变异的能力。然而,最近对第二个参考基因组的组装表明,短读长基因分型方法会遗漏数百个结构变异,包括那些影响表型的变异。虽然使用高覆盖度长读长组装的基因组可以实现高水平的连续性和完整性,但对成本、错误和低产量的担忧限制了此类测序方法的广泛应用。在这里,我们在单个牛津纳米孔MinION流动槽上运行24小时对(ISO1)参考菌株进行了重测序。仅使用长度超过1 kb或至少有30倍覆盖度的读长,我们组装了一个高度连续的基因组。添加廉价的配对读长并随后使用光学图谱技术进行支架构建,得到了一个完整性和连续性与参考组装相当的组装体。将我们的组装体与ISO1的参考组装体进行比较,发现了许多结构变异(SVs),包括影响具有发育、行为和代谢功能基因的新型LTR转座元件插入和重复。总体而言,这些SVs提供了基因组进化动态的一个快照。此外,我们的组装以及与参考基因组的比较表明,单个实验室现在可以以低于1000美元(美元)的成本完成高质量的参考基因组组装以及使用此类组装体进行全面的变异发现。