兔 mash：加速基于哈希的现代多核架构上的基因组分析。

RabbitMash: accelerating hash-based genome analysis on modern multi-core architectures.

机构信息

School of Software, Shandong University, Jinan 250101, China.

Shenzhen Research Institute of Shandong University, Shenzhen 518063, China.

出版信息

Bioinformatics. 2021 May 5;37(6):873-875. doi: 10.1093/bioinformatics/btaa754.

DOI:10.1093/bioinformatics/btaa754

PMID:32845281

Abstract

MOTIVATION

Mash is a popular hash-based genome analysis toolkit with applications to important downstream analyses tasks such as clustering and assembly. However, Mash is currently not able to fully exploit the capabilities of modern multi-core architectures, which in turn leads to high runtimes for large-scale genomic datasets.

RESULTS

We present RabbitMash, an efficient highly optimized implementation of Mash which can take full advantage of modern hardware including multi-threading, vectorization and fast I/O. We show that our approach achieves speedups of at least 1.3, 9.8, 8.5 and 4.4 compared to Mash for the operations sketch, dist, triangle and screen, respectively. Furthermore, RabbitMash is able to compute the all-versus-all distances of 100 321 genomes in <5 min on a 40-core workstation while Mash requires over 40 min.

AVAILABILITY AND IMPLEMENTATION

RabbitMash is available at https://github.com/ZekunYin/RabbitMash.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

Mash 是一个流行的基于哈希的基因组分析工具包，可应用于聚类和组装等重要的下游分析任务。然而，Mash 目前还不能充分利用现代多核架构的功能，这反过来又导致大规模基因组数据集的运行时间很长。

结果

我们提出了 RabbitMash，这是一种高效的 Mash 高度优化的实现，可以充分利用现代硬件，包括多线程、向量化和快速 I/O。我们表明，与 Mash 相比，我们的方法在操作草图、距离、三角形和屏幕方面分别实现了至少 1.3、9.8、8.5 和 4.4 的加速。此外，RabbitMash 能够在一个 40 核工作站上计算 100321 个基因组的全对全距离，耗时不到 5 分钟，而 Mash 则需要 40 多分钟。

可用性和实现

RabbitMash 可在 https://github.com/ZekunYin/RabbitMash 获得。

补充信息

补充数据可在“Bioinformatics”在线获得。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

兔 mash：加速基于哈希的现代多核架构上的基因组分析。

RabbitMash: accelerating hash-based genome analysis on modern multi-core architectures.

机构信息

出版信息

MOTIVATION

RESULTS

AVAILABILITY AND IMPLEMENTATION

SUPPLEMENTARY INFORMATION

动机

结果

可用性和实现

补充信息

相似文献

引用本文的文献

兔 mash：加速基于哈希的现代多核架构上的基因组分析。

RabbitMash: accelerating hash-based genome analysis on modern multi-core architectures.

机构信息

出版信息

MOTIVATION

RESULTS

AVAILABILITY AND IMPLEMENTATION

SUPPLEMENTARY INFORMATION

动机

结果

可用性和实现

补充信息

相似文献

引用本文的文献