Sadasivan Harisankar, Wadden Jack, Goliya Kush, Ranjan Piyush, Dickson Robert P, Blaauw David, Das Reetuparna, Narayanasamy Satish
Electrical Engineering and Computer Science, University of Michigan, Ann Arbor, 48109, USA.
Department of Internal Medicine, University of Michigan Medical School, Ann Arbor, 48109, USA.
Arch Clin Biomed Res. 2023;7(1):45-57. doi: 10.26502/acbr.50170318. Epub 2023 Jan 28.
ReadUntil enables Oxford Nanopore Technology's (ONT) sequencers to selectively sequence reads of target species in real-time. This enables efficient microbial enrichment for applications such as microbial abundance estimation and is particularly beneficial for metagenomic samples with a very high fraction of non-target reads (> 99% can be human reads). However, read-until requires a fast and accurate software filter that analyzes a short prefix of a read and determines if it belongs to a microbe of interest (target) or not. The baseline Read Until pipeline uses a deep neural network-based basecaller called Guppy and is slow and inaccurate for this task (~60% of bases sequenced are unclassified). We present RawMap, an efficient CPU-only microbial species-agnostic Read Until classifier for filtering non-target human reads in the squiggle space. RawMap uses a Support Vector Machine (SVM), which is trained to distinguish human from microbe using non-linear and non-stationary characteristics of ONT's squiggle output (continuous electrical signals). Compared to the baseline Read Until pipeline, RawMap is a 1327X faster classifier and significantly improves the sequencing time and cost, and compute time savings. We show that RawMap augmented pipelines reduce sequencing time and cost by ~24% and computing cost by 22%. Additionally, since RawMap is agnostic to microbial species, it can also classify microbial species it is not trained on. We also discuss how RawMap may be used as an alternative to the RT-PCR test for viral load quantification of SARS-CoV-2.
ReadUntil使牛津纳米孔技术公司(ONT)的测序仪能够实时选择性地对目标物种的 reads 进行测序。这使得在诸如微生物丰度估计等应用中能够高效地富集微生物,对于非目标 reads 比例非常高的宏基因组样本(>99%可能是人类 reads)尤其有益。然而,ReadUntil 需要一个快速且准确的软件过滤器,该过滤器分析 read 的短前缀并确定它是否属于感兴趣的微生物(目标)。基线 Read Until 流程使用一个名为Guppy的基于深度神经网络的碱基识别器,对于此任务来说速度慢且不准确(约60%的测序碱基未分类)。我们提出了RawMap,这是一种仅在CPU上运行的高效微生物物种无关的Read Until分类器,用于在波形空间中过滤非目标人类reads。RawMap使用支持向量机(SVM),该支持向量机经过训练,利用ONT波形输出(连续电信号)的非线性和非平稳特征来区分人类和微生物。与基线Read Until流程相比,RawMap的分类速度快1327倍,显著缩短了测序时间和成本,并节省了计算时间。我们表明,使用RawMap增强的流程可将测序时间和成本降低约24%,计算成本降低22%。此外,由于RawMap与微生物物种无关,它还可以对未经过训练的微生物物种进行分类。我们还讨论了RawMap如何可以用作替代RT-PCR测试来定量SARS-CoV-2病毒载量。