Computational Biology and Medical Ecology Lab, State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming 650223, China; Center for Excellence in Animal Evolution and Genetics, Chinese Academy of Sciences, Kunming 650223, China; Kunming College of Life Science, Chinese Academy of Sciences, Kunming, 650223, China.
Computational Biology and Medical Ecology Lab, State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming 650223, China; Kunming College of Life Science, Chinese Academy of Sciences, Kunming, 650223, China.
Genomics. 2019 Dec;111(6):1896-1901. doi: 10.1016/j.ygeno.2018.12.013. Epub 2018 Dec 27.
The 3rd generation of sequencing (3GS) technologies generate ultra-long reads (up to 1 Mb), which makes it possible to eliminate gaps and effectively resolve repeats in genome assembly. However, the 3GS technologies suffer from the high base-level error rates (15%-40%) and high sequencing costs. To address these issues, the hybrid assembly strategy, which utilizes both 3GS reads and inexpensive NGS (next generation sequencing) short reads, was invented. Here, we use 10×-Genomics® technology, which integrates a novel bar-coding strategy with Illumina® NGS with an advantage of revealing long-range sequence information, to replace common NGS short reads for hybrid assembly of long erroneous 3GS reads. We demonstrate the feasibility of integrating the 3GS with 10×-Genomics technologies for a new strategy of hybrid de novo genome assembly by utilizing DBG2OLC and Sparc software packages, previously developed by the authors for regular hybrid assembly. Using a human genome as an example, we show that with only 7× coverage of ultra-long Nanopore® reads, augmented with 10× reads, our approach achieved nearly the same level of quality, compared with non-hybrid assembly with 35× coverage of Nanopore reads. Compared with the assembly with 10×-Genomics reads alone, our assembly is gapless with slightly high cost. These results suggest that our new hybrid assembly with ultra-long 3GS reads augmented with 10×-Genomics reads offers a low-cost (less than ¼ the cost of the non-hybrid assembly) and computationally light-weighted (only took 109 calendar hours with peak memory-usage = 61GB on a dual-CPU office workstation) solution for extending the wide applications of the 3GS technologies.
第三代测序(3GS)技术可生成超长读段(长达 1Mb),这使得消除基因组组装中的缺口和有效解决重复问题成为可能。然而,3GS 技术存在碱基错误率高(15%-40%)和测序成本高的问题。为了解决这些问题,发明了混合组装策略,该策略同时利用 3GS 读段和廉价的 NGS(下一代测序)短读段。在这里,我们使用 10×-Genomics®技术,该技术将新颖的条形码策略与 Illumina®NGS 相结合,具有揭示长程序列信息的优势,以替代常见的 NGS 短读段进行长错误 3GS 读段的混合组装。我们通过利用作者先前开发的用于常规混合组装的 DBG2OLC 和 Sparc 软件包,展示了将 3GS 与 10×-Genomics 技术集成用于混合从头基因组组装的新策略的可行性。使用人类基因组作为示例,我们表明,仅使用 7×覆盖度的超长 Nanopore®读段,并辅以 10×读段,我们的方法与非混合组装中 35×覆盖度的 Nanopore 读段相比,达到了几乎相同的质量水平。与单独使用 10×-Genomics 读段的组装相比,我们的组装没有缺口,但成本略高。这些结果表明,我们的新混合组装方法使用超长 3GS 读段并辅以 10×-Genomics 读段,提供了一种低成本(不到非混合组装成本的四分之一)且计算量轻的解决方案(在双 CPU 办公工作站上仅用时 109 个日历小时,峰值内存使用量为 61GB),可扩展 3GS 技术的广泛应用。