Suppr超能文献

RabbitSketch:用于基因组分析的高性能草图绘制库。

RabbitSketch: a high-performance sketching library for genome analysis.

作者信息

Zhang Tong, Yin Zekun, Xu Xiaoming, Yan Lifeng, Zhu Fangjin, Duan Xiaohui, Schmidt Bertil, Liu Weiguo

机构信息

School of Software, Shandong University, Jinan 250101, China.

Institute for Computer Science, Johannes Gutenberg University, Mainz 55128, Germany.

出版信息

Bioinformatics. 2025 May 6;41(5). doi: 10.1093/bioinformatics/btaf249.

Abstract

SUMMARY

We present RabbitSketch, a highly optimized library of sketching algorithms such as MinHash, OrderMinHash, and HyperLogLog that can exploit the power of modern multi-core CPUs. It provides significant speedups compared to existing implementations, ranging from 2.30× to 49.55×, as well as flexible and easy-to-use interfaces for both Python and C++. As a result, the similarity analysis of 455GB genomic data can be completed in only 5 minutes using RabbitSketch with merely 20 lines of Python code. As a case study, we enhanced RabbitTClust by integrating RabbitSketch's Kssd algorithm, resulting in a 1.54× speedup with no loss in accuracy.

AVAILABILITY AND IMPLEMENTATION

RabbitSketch is available at https://github.com/RabbitBio/RabbitSketch with an archived version at Zenodo: https://doi.org/10.5281/zenodo.14903962. Detailed API documentation is available at https://rabbitsketch.readthedocs.io/en/latest.

摘要

摘要

我们展示了RabbitSketch,这是一个高度优化的草图算法库,如MinHash、OrderMinHash和HyperLogLog,它可以利用现代多核CPU的能力。与现有实现相比,它显著提高了速度,加速比从2.30倍到49.55倍不等,并且为Python和C++提供了灵活且易于使用的接口。因此,使用RabbitSketch只需20行Python代码,就能在仅5分钟内完成455GB基因组数据的相似性分析。作为一个案例研究,我们通过集成RabbitSketch的Kssd算法增强了RabbitTClust,实现了1.54倍的加速且精度没有损失。

可用性和实现方式

RabbitSketch可在https://github.com/RabbitBio/RabbitSketch获取,其存档版本在Zenodo:https://doi.org/10.5281/zenodo.14903962。详细的API文档可在https://rabbitsketch.readthedocs.io/en/latest获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e553/12054975/92395eda0d69/btaf249f1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验