Suppr超能文献

ntHash2:核苷酸序列的递归间隔种子哈希。

ntHash2: recursive spaced seed hashing for nucleotide sequences.

机构信息

Canada's Michael Smith Genome Sciences Centre, British Columbia Cancer Agency, Vancouver, BC V5Z 4S6, Canada.

Faculty of Science, University of British Columbia, Vancouver, BC V6T 1Z4, Canada.

出版信息

Bioinformatics. 2022 Oct 14;38(20):4812-4813. doi: 10.1093/bioinformatics/btac564.

Abstract

MOTIVATION

Spaced seeds are robust alternatives to k-mers in analyzing nucleotide sequences with high base mismatch rates. Hashing is also crucial for efficiently storing abundant sequence data. Here, we introduce ntHash2, a fast algorithm for spaced seed hashing that can be integrated into various bioinformatics tools for efficient sequence analysis with applications in genome research.

RESULTS

ntHash2 is up to 2.1× faster at hashing various spaced seeds than the previous version and 3.8× faster than conventional hashing algorithms with naïve adaptation. Additionally, we reduced the collision rate of ntHash for longer k-mer lengths and improved the uniformity of the hash distribution by modifying the canonical hashing mechanism.

AVAILABILITY AND IMPLEMENTATION

ntHash2 is freely available online at github.com/bcgsc/ntHash under an MIT license.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

在分析碱基错配率较高的核苷酸序列时,间隔种子是 k-mer 的强大替代品。散列对于高效存储大量序列数据也至关重要。在这里,我们引入了 ntHash2,这是一种用于间隔种子散列的快速算法,可以集成到各种生物信息学工具中,用于高效的序列分析,并在基因组研究中得到应用。

结果

ntHash2 对各种间隔种子的散列速度比上一版本快 2.1 倍,比使用原始适应的传统散列算法快 3.8 倍。此外,我们通过修改规范散列机制,降低了 ntHash 的碰撞率,并提高了更长 k-mer 长度的散列分布的均匀性。

可用性和实现

ntHash2 可在 MIT 许可证下免费在线获得,网址为 github.com/bcgsc/ntHash。

补充信息

补充数据可在 Bioinformatics 在线获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0947/9563681/2e77916ae65e/btac564f1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验