Goodwin Sara, Gurtowski James, Ethe-Sayers Scott, Deshpande Panchajanya, Schatz Michael C, McCombie W Richard
Cold Spring Harbor Laboratory, Cold Spring Harbor, New York 11724, USA.
Genome Res. 2015 Nov;25(11):1750-6. doi: 10.1101/gr.191395.115. Epub 2015 Oct 7.
Monitoring the progress of DNA molecules through a membrane pore has been postulated as a method for sequencing DNA for several decades. Recently, a nanopore-based sequencing instrument, the Oxford Nanopore MinION, has become available, and we used this for sequencing the Saccharomyces cerevisiae genome. To make use of these data, we developed a novel open-source hybrid error correction algorithm Nanocorr specifically for Oxford Nanopore reads, because existing packages were incapable of assembling the long read lengths (5-50 kbp) at such high error rates (between ∼5% and 40% error). With this new method, we were able to perform a hybrid error correction of the nanopore reads using complementary MiSeq data and produce a de novo assembly that is highly contiguous and accurate: The contig N50 length is more than ten times greater than an Illumina-only assembly (678 kb versus 59.9 kbp) and has >99.88% consensus identity when compared to the reference. Furthermore, the assembly with the long nanopore reads presents a much more complete representation of the features of the genome and correctly assembles gene cassettes, rRNAs, transposable elements, and other genomic features that were almost entirely absent in the Illumina-only assembly.
几十年来,通过膜孔监测DNA分子的进展情况一直被认为是一种DNA测序方法。最近,一种基于纳米孔的测序仪器——牛津纳米孔MinION问世了,我们用它对酿酒酵母基因组进行测序。为了利用这些数据,我们专门为牛津纳米孔读数开发了一种新颖的开源混合纠错算法Nanocorr,因为现有的软件包无法在如此高的错误率(约5%至40%的错误率)下组装长读长(5-50 kbp)。有了这种新方法,我们能够使用互补的MiSeq数据对纳米孔读数进行混合纠错,并生成高度连续且准确的从头组装:重叠群N50长度比仅使用Illumina的组装长十多倍(678 kb对59.9 kbp),与参考序列相比,共有一致性身份>99.88%。此外,使用长纳米孔读数进行的组装更完整地呈现了基因组的特征,并正确组装了基因盒、rRNA、转座元件和其他在仅使用Illumina的组装中几乎完全缺失的基因组特征。