Suppr超能文献

RawHash2:基于哈希的种子生成和自适应量化的原始纳米孔信号映射。

RawHash2: mapping raw nanopore signals using hash-based seeding and adaptive quantization.

机构信息

Department of Information Technology and Electrical Engineering, ETH Zurich, Zurich 8092, Switzerland.

出版信息

Bioinformatics. 2024 Aug 2;40(8). doi: 10.1093/bioinformatics/btae478.

Abstract

SUMMARY

Raw nanopore signals can be analyzed while they are being generated, a process known as real-time analysis. Real-time analysis of raw signals is essential to utilize the unique features that nanopore sequencing provides, enabling the early stopping of the sequencing of a read or the entire sequencing run based on the analysis. The state-of-the-art mechanism, RawHash, offers the first hash-based efficient and accurate similarity identification between raw signals and a reference genome by quickly matching their hash values. In this work, we introduce RawHash2, which provides major improvements over RawHash, including more sensitive quantization and chaining algorithms, weighted mapping decisions, frequency filters to reduce ambiguous seed hits, minimizers for hash-based sketching, and support for the R10.4 flow cell version and POD5 and SLOW5 file formats. Compared to RawHash, RawHash2 provides better F1 accuracy (on average by 10.57% and up to 20.25%) and better throughput (on average by 4.0× and up to 9.9×) than RawHash.

AVAILABILITY AND IMPLEMENTATION

RawHash2 is available at https://github.com/CMU-SAFARI/RawHash. We also provide the scripts to fully reproduce our results on our GitHub page.

摘要

摘要

原始纳米孔信号可以在生成时进行分析,这一过程被称为实时分析。实时分析原始信号对于利用纳米孔测序提供的独特特征至关重要,它可以根据分析结果提前停止读取或整个测序运行。最先进的 RawHash 机制通过快速匹配其哈希值,为原始信号和参考基因组之间提供了基于哈希的高效和准确的相似性识别。在这项工作中,我们引入了 RawHash2,它相对于 RawHash 有了重大改进,包括更敏感的量化和链接算法、加权映射决策、减少模糊种子命中的频率滤波器、基于哈希的草图的最小化器,以及对 R10.4 流动池版本和 POD5 和 SLOW5 文件格式的支持。与 RawHash 相比,RawHash2 提供了更好的 F1 准确性(平均提高 10.57%,最高提高 20.25%)和更高的吞吐量(平均提高 4.0 倍,最高提高 9.9 倍)。

可用性和实现

RawHash2 可在 https://github.com/CMU-SAFARI/RawHash 上获得。我们还在 GitHub 页面上提供了完全重现我们结果的脚本。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0b88/11333567/07fc47a05d79/btae478f1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验