Hasso Plattner Institute, Digital Engineering Faculty, University of Potsdam, 14482 Potsdam, Germany.
Bioinformatics Unit (MF1), Robert Koch Institute, 13353 Berlin, Germany.
Bioinformatics. 2022 Jun 24;38(Suppl 1):i153-i160. doi: 10.1093/bioinformatics/btac223.
Nanopore sequencers allow targeted sequencing of interesting nucleotide sequences by rejecting other sequences from individual pores. This feature facilitates the enrichment of low-abundant sequences by depleting overrepresented ones in-silico. Existing tools for adaptive sampling either apply signal alignment, which cannot handle human-sized reference sequences, or apply read mapping in sequence space relying on fast graphical processing units (GPU) base callers for real-time read rejection. Using nanopore long-read mapping tools is also not optimal when mapping shorter reads as usually analyzed in adaptive sampling applications.
Here, we present a new approach for nanopore adaptive sampling that combines fast CPU and GPU base calling with read classification based on Interleaved Bloom Filters. ReadBouncer improves the potential enrichment of low abundance sequences by its high read classification sensitivity and specificity, outperforming existing tools in the field. It robustly removes even reads belonging to large reference sequences while running on commodity hardware without GPUs, making adaptive sampling accessible for in-field researchers. Readbouncer also provides a user-friendly interface and installer files for end-users without a bioinformatics background.
The C++ source code is available at https://gitlab.com/dacs-hpi/readbouncer.
Supplementary data are available at Bioinformatics online.
纳米孔测序仪允许通过拒绝个体孔中的其他序列来靶向测序感兴趣的核苷酸序列。该特性通过在计算机上耗尽过表达的序列,从而促进低丰度序列的富集。现有的自适应采样工具要么应用信号对齐,而该方法无法处理人类大小的参考序列,要么在序列空间中应用基于快速图形处理单元(GPU)碱基调用器的读映射进行实时读丢弃。当映射较短的读段(通常在自适应采样应用中分析)时,使用纳米孔长读段映射工具也不是最优的。
本文提出了一种新的纳米孔自适应采样方法,该方法结合了快速 CPU 和 GPU 碱基调用以及基于交错布隆过滤器的读分类。ReadBouncer 通过高读分类灵敏度和特异性提高了低丰度序列的潜在富集效果,优于该领域中的现有工具。它在没有 GPU 的商用硬件上运行时,能够稳健地去除甚至属于大型参考序列的读段,使自适应采样可供现场研究人员使用。Readbouncer 还为没有生物信息学背景的终端用户提供了用户友好的界面和安装程序文件。
C++源代码可在 https://gitlab.com/dacs-hpi/readbouncer 上获得。
补充数据可在 Bioinformatics 在线获得。