Srivathsan Amrita, Baloğlu Bilgenur, Wang Wendy, Tan Wei X, Bertrand Denis, Ng Amanda H Q, Boey Esther J H, Koh Jayce J Y, Nagarajan Niranjan, Meier Rudolf
Department of Biological Sciences, National University of Singapore, Singapore.
Lee Kong Chian Natural History Museum, Singapore.
Mol Ecol Resour. 2018 Apr 19. doi: 10.1111/1755-0998.12890.
DNA barcodes are useful for species discovery and species identification, but obtaining barcodes currently requires a well-equipped molecular laboratory and is time-consuming, and/or expensive. We here address these issues by developing a barcoding pipeline for Oxford Nanopore MinION™ and demonstrating that one flow cell can generate barcodes for ~500 specimens despite the high basecall error rates of MinION™ reads. The pipeline overcomes these errors by first summarizing all reads for the same tagged amplicon as a consensus barcode. Consensus barcodes are overall mismatch-free but retain indel errors that are concentrated in homopolymeric regions. They are addressed with an optional error correction pipeline that is based on conserved amino acid motifs from publicly available barcodes. The effectiveness of this pipeline is documented by analysing reads from three MinION™ runs that represent three different stages of MinION™ development. They generated data for (i) 511 specimens of a mixed Diptera sample, (ii) 575 specimens of ants and (iii) 50 specimens of Chironomidae. The run based on the latest chemistry yielded MinION™ barcodes for 490 of the 511 specimens which were assessed against reference Sanger barcodes (N = 471). Overall, the MinION™ barcodes have an accuracy of 99.3%-100% with the number of ambiguous bases after correction ranging from <0.01% to 1.5% depending on which correction pipeline is used. We demonstrate that it requires ~2 hr of sequencing to gather all information needed for obtaining reliable barcodes for most specimens (>90%). We estimate that up to 1,000 barcodes can be generated in one flow cell and that the cost per barcode can be <USD 2.
DNA条形码有助于物种发现和物种鉴定,但目前获取条形码需要配备完善的分子实验室,且耗时、昂贵。我们在此通过开发针对牛津纳米孔MinION™的条形码流程来解决这些问题,并证明尽管MinION™读取的碱基识别错误率很高,但一个流动槽仍可为约500个样本生成条形码。该流程首先将同一标记扩增子的所有读取结果汇总为一个一致性条形码,从而克服这些错误。一致性条形码总体上无错配,但保留了集中在同聚物区域的插入缺失错误。通过基于公开可用条形码中保守氨基酸基序的可选纠错流程来解决这些错误。通过分析来自代表MinION™三个不同开发阶段的三次MinION™运行的读取结果,记录了该流程的有效性。它们生成了以下数据:(i)一个混合双翅目样本的511个样本,(ii)575个蚂蚁样本,以及(iii)50个摇蚊科样本。基于最新化学方法的运行产生了511个样本中的490个MinION™条形码,并与参考桑格条形码(N = 471)进行了评估。总体而言,MinION™条形码的准确率为99.3%-100%,校正后的模糊碱基数量根据所使用的校正流程不同,范围从<0.01%到1.5%。我们证明,对于大多数样本(>90%),获取可靠条形码所需的所有信息大约需要2小时的测序时间。我们估计一个流动槽最多可生成1000个条形码,每个条形码的成本可低于2美元。