Department of Biochemistry and Molecular Biology, Bio21 Molecular Science and Biotechnology Institute, University of Melbourne, Parkville, Victoria, Australia.
PLoS Comput Biol. 2018 Nov 20;14(11):e1006583. doi: 10.1371/journal.pcbi.1006583. eCollection 2018 Nov.
Multiplexing, the simultaneous sequencing of multiple barcoded DNA samples on a single flow cell, has made Oxford Nanopore sequencing cost-effective for small genomes. However, it depends on the ability to sort the resulting sequencing reads by barcode, and current demultiplexing tools fail to classify many reads. Here we present Deepbinner, a tool for Oxford Nanopore demultiplexing that uses a deep neural network to classify reads based on the raw electrical read signal. This 'signal-space' approach allows for greater accuracy than existing 'base-space' tools (Albacore and Porechop) for which signals must first be converted to DNA base calls, itself a complex problem that can introduce noise into the barcode sequence. To assess Deepbinner and existing tools, we performed multiplex sequencing on 12 amplicons chosen for their distinguishability. This allowed us to establish a ground truth classification for each read based on internal sequence alone. Deepbinner had the lowest rate of unclassified reads (7.8%) and the highest demultiplexing precision (98.5% of classified reads were correctly assigned). It can be used alone (to maximise the number of classified reads) or in conjunction with other demultiplexers (to maximise precision and minimise false positive classifications). We also found cross-sample chimeric reads (0.3%) and evidence of barcode switching (0.3%) in our dataset, which likely arise during library preparation and may be detrimental for quantitative studies that use multiplexing. Deepbinner is open source (GPLv3) and available at https://github.com/rrwick/Deepbinner.
多重测序,即在单个流动池上同时对多个带有条形码的 DNA 样本进行测序,使牛津纳米孔测序在小基因组方面具有成本效益。然而,它依赖于通过条形码对产生的测序读取进行排序的能力,并且当前的多路分解工具无法对许多读取进行分类。在这里,我们提出了 Deepbinner,这是一种用于牛津纳米孔多路分解的工具,它使用深度神经网络根据原始电读信号对读取进行分类。这种“信号空间”方法比现有的“碱基空间”工具(Albacore 和 Porechop)具有更高的准确性,后者必须首先将信号转换为 DNA 碱基调用,而这本身就是一个复杂的问题,可能会给条形码序列带来噪声。为了评估 Deepbinner 和现有的工具,我们对 12 个扩增子进行了多重测序,这些扩增子因其可区分性而被选中。这使我们能够根据内部序列单独为每个读取建立一个真实的分类。Deepbinner 的未分类读取率最低(7.8%),多路分解精度最高(98.5%的分类读取被正确分配)。它可以单独使用(以最大化分类读取的数量),也可以与其他多路分解器一起使用(以最大化精度和最小化假阳性分类)。我们还在数据集(0.3%)中发现了跨样本嵌合读取和条形码切换的证据(0.3%),这可能在文库制备过程中产生,并且可能对使用多重化的定量研究有害。Deepbinner 是开源的(GPLv3),可在 https://github.com/rrwick/Deepbinner 上获得。