Department of Earth, Atmospheric and Planetary Sciences, Massachusetts Institute of Technology, 77 Massachusetts Ave, E25-610, Cambridge, MA, 02139, USA.
ReadCoor, Cambridge, MA, USA.
BMC Bioinformatics. 2018 Mar 27;19(1):108. doi: 10.1186/s12859-018-2124-3.
Long-read nanopore sequencing technology is of particular significance for taxonomic identification at or below the species level. For many environmental samples, the total extractable DNA is far below the current input requirements of nanopore sequencing, preventing "sample to sequence" metagenomics from low-biomass or recalcitrant samples.
Here we address this problem by employing carrier sequencing, a method to sequence low-input DNA by preparing the target DNA with a genomic carrier to achieve ideal library preparation and sequencing stoichiometry without amplification. We then use CarrierSeq, a sequence analysis workflow to identify the low-input target reads from the genomic carrier. We tested CarrierSeq experimentally by sequencing from a combination of 0.2 ng Bacillus subtilis ATCC 6633 DNA in a background of 1000 ng Enterobacteria phage λ DNA. After filtering of carrier, low quality, and low complexity reads, we detected target reads (B. subtilis), contamination reads, and "high quality noise reads" (HQNRs) not mapping to the carrier, target or known lab contaminants. These reads appear to be artifacts of the nanopore sequencing process as they are associated with specific channels (pores).
By treating sequencing as a Poisson arrival process, we implement a statistical test to reject data from channels dominated by HQNRs while retaining low-input target reads.
长读纳米孔测序技术对于在种以下水平进行分类鉴定具有特别重要的意义。对于许多环境样本,总可提取 DNA 远低于纳米孔测序的当前输入要求,这使得“样本到序列”宏基因组学无法应用于低生物量或顽固样本。
在这里,我们通过采用载体测序来解决这个问题,这是一种通过用基因组载体制备目标 DNA 来对低输入 DNA 进行测序的方法,从而实现了无需扩增的理想文库制备和测序化学计量比。然后,我们使用 CarrierSeq 对低输入目标读取进行序列分析,该分析流程可以从基因组载体中识别出低输入目标读取。我们通过在 1000ng 肠杆菌噬菌体 λ DNA 的背景下测序 0.2ng 枯草芽孢杆菌 ATCC 6633 DNA 的组合来实验性地测试 CarrierSeq。在过滤载体、低质量和低复杂度读取后,我们检测到目标读取(枯草芽孢杆菌)、污染读取和“高质量噪声读取”(HQNRs),它们无法映射到载体、目标或已知的实验室污染物。这些读取似乎是纳米孔测序过程的人工制品,因为它们与特定的通道(孔)有关。
通过将测序视为泊松到达过程,我们实现了一个统计测试,可以拒绝由 HQNRs 主导的通道的数据,同时保留低输入的目标读取。