Biomedical Informatics Group, Department of Computer Science, ETH Zurich, Zürich, 8092, Switzerland.
Empirical Inference, Max Planck Institute for Intelligent Systems, Tübingen, 72076, Germany.
Bioinformatics. 2024 May 2;40(5). doi: 10.1093/bioinformatics/btae199.
The Oxford Nanopore Technologies (ONT) ReadUntil API enables selective sequencing, which aims to selectively favor interesting over uninteresting reads, e.g. to deplete or enrich certain genomic regions. The performance gain depends on the selective sequencing decision-making algorithm (SSDA) which decides whether to reject a read, stop receiving a read, or wait for more data. Since real runs are time-consuming and costly, simulating the ONT sequencer with support for the ReadUntil API is highly beneficial for comparing and optimizing new SSDAs. Existing software like MinKNOW and UNCALLED only return raw signal data, are memory-intensive, require huge and often unavailable multi-fast5 files (≥100GB) and are not clearly documented.
We present the ONT device simulator SimReadUntil that takes a set of full reads as input, distributes them to channels and plays them back in real time including mux scans, channel gaps and blockages, and allows to reject reads as well as stop receiving data from them. Our modified ReadUntil API provides the basecalled reads rather than the raw signal, reducing computational load and focusing on the SSDA rather than on basecalling. Tuning the parameters of tools like ReadFish and ReadBouncer becomes easier because a GPU for basecalling is no longer required. We offer various methods to extract simulation parameters from a sequencing summary file and adapt ReadFish to replicate one of their enrichment experiments. SimReadUntil's gRPC interface allows standardized interaction with a wide range of programming languages.
Code and fully worked examples are available on GitHub (https://github.com/ratschlab/sim_read_until).
牛津纳米孔技术(ONT)的 ReadUntil API 支持选择性测序,旨在有选择地优先考虑有趣的读取而不是无趣的读取,例如耗尽或富集某些基因组区域。性能增益取决于选择性测序决策算法(SSDA),该算法决定是拒绝读取、停止接收读取还是等待更多数据。由于实际运行既耗时又昂贵,因此使用支持 ReadUntil API 的 ONT 测序仪进行模拟对于比较和优化新的 SSDAs 非常有益。现有的软件,如 MinKNOW 和 UNCALLED,仅返回原始信号数据,内存密集型,需要庞大且通常不可用的多 fast5 文件(≥100GB),并且文档不清晰。
我们提出了 ONT 设备模拟器 SimReadUntil,它将一组完整的读取作为输入,将它们分配到通道中,并实时播放,包括多路复用扫描、通道间隙和阻塞,并允许拒绝读取以及停止从它们接收数据。我们修改后的 ReadUntil API 提供了碱基调用读取,而不是原始信号,从而减少了计算负载,并将重点放在 SSDA 上,而不是碱基调用上。调整 ReadFish 和 ReadBouncer 等工具的参数变得更加容易,因为不再需要用于碱基调用的 GPU。我们提供了从测序摘要文件中提取模拟参数的各种方法,并调整了 ReadFish 以复制他们的一个富集实验。SimReadUntil 的 gRPC 接口允许与广泛的编程语言进行标准化交互。
代码和完整的工作示例可在 GitHub 上获得(https://github.com/ratschlab/sim_read_until)。