Jain Miten, Tyson John R, Loose Matthew, Ip Camilla L C, Eccles David A, O'Grady Justin, Malla Sunir, Leggett Richard M, Wallerman Ola, Jansen Hans J, Zalunin Vadim, Birney Ewan, Brown Bonnie L, Snutch Terrance P, Olsen Hugh E
University of California at Santa Cruz, Santa Cruz, CA, USA.
Michael Smith Laboratories and Djavad Mowfaghian Centre for Brain Health, University of British Columbia, Vancouver, Canada.
F1000Res. 2017 May 31;6:760. doi: 10.12688/f1000research.11354.1. eCollection 2017.
Long-read sequencing is rapidly evolving and reshaping the suite of opportunities for genomic analysis. For the MinION in particular, as both the platform and chemistry develop, the user community requires reference data to set performance expectations and maximally exploit third-generation sequencing. We performed an analysis of MinION data derived from whole genome sequencing of K-12 using the R9.0 chemistry, comparing the results with the older R7.3 chemistry.
We computed the error-rate estimates for insertions, deletions, and mismatches in MinION reads.
Run-time characteristics of the flow cell and run scripts for R9.0 were similar to those observed for R7.3 chemistry, but with an 8-fold increase in bases per second (from 30 bps in R7.3 and SQK-MAP005 library preparation, to 250 bps in R9.0) processed by individual nanopores, and less drop-off in yield over time. The 2-dimensional ("2D") N50 read length was unchanged from the prior chemistry. Using the proportion of alignable reads as a measure of base-call accuracy, 99.9% of "pass" template reads from 1-dimensional ("1D") experiments were mappable and ~97% from 2D experiments. The median identity of reads was ~89% for 1D and ~94% for 2D experiments. The total error rate (miscall + insertion + deletion ) decreased for 2D "pass" reads from 9.1% in R7.3 to 7.5% in R9.0 and for template "pass" reads from 26.7% in R7.3 to 14.5% in R9.0.
These Phase 2 MinION experiments serve as a baseline by providing estimates for read quality, throughput, and mappability. The datasets further enable the development of bioinformatic tools tailored to the new R9.0 chemistry and the design of novel biological applications for this technology.
K: thousand, Kb: kilobase (one thousand base pairs), M: million, Mb: megabase (one million base pairs), Gb: gigabase (one billion base pairs).
长读长测序正在迅速发展,并重塑基因组分析的一系列机会。特别是对于MinION,随着平台和化学技术的发展,用户群体需要参考数据来设定性能预期并最大程度地利用第三代测序技术。我们使用R9.0化学技术对源自K-12全基因组测序的MinION数据进行了分析,并将结果与旧的R7.3化学技术进行了比较。
我们计算了MinION读数中插入、缺失和错配的错误率估计值。
R9.0的流动槽和运行脚本的运行时特性与R7.3化学技术观察到的相似,但单个纳米孔每秒处理的碱基增加了8倍(从R7.3和SQK-MAP005文库制备中的30个碱基每秒增加到R9.0中的250个碱基每秒),并且产量随时间的下降更少。二维(“2D”)N50读长与之前的化学技术相比没有变化。使用可比对读数的比例作为碱基识别准确性的度量,一维(“1D”)实验中99.9%的“通过”模板读数可比对,二维实验中约为97%。一维实验读数的中位数一致性约为89%,二维实验约为94%。二维“通过”读数的总错误率(错误识别+插入+缺失)从R7.3中的9.1%降至R9.0中的7.5%,模板“通过”读数从R7.3中的26.7%降至R9.0中的14.5%。
这些第二阶段的MinION实验通过提供读数质量、通量和可比对性的估计值,作为一个基线。这些数据集进一步促进了针对新的R9.0化学技术的生物信息学工具的开发以及该技术新的生物学应用的设计。
K:千,Kb:千碱基(一千个碱基对),M:百万,Mb:兆碱基(一百万个碱基对),Gb:吉碱基(一十亿个碱基对)。