Suppr超能文献

PipeMEM:一种在 Spark 中使用低开销加速 BWA-MEM 的框架。

PipeMEM: A Framework to Speed Up BWA-MEM in Spark with Low Overhead.

机构信息

Communication & Computer Network Lab of Guangdong, School of Computer Science & Engineering, South China University of Technology, Wushan Road 381, Guangzhou 51000, China.

出版信息

Genes (Basel). 2019 Nov 4;10(11):886. doi: 10.3390/genes10110886.

Abstract

(1) Background: DNA sequence alignment process is an essential step in genome analysis. BWA-MEM has been a prevalent single-node tool in genome alignment because of its high speed and accuracy. The exponentially generated genome data requiring a multi-node solution to handle large volumes of data currently remains a challenge. Spark is a ubiquitous big data platform that has been exploited to assist genome alignment in handling this challenge. Nonetheless, existing works that utilize Spark to optimize BWA-MEM suffer from higher overhead. (2) Methods: In this paper, we presented PipeMEM, a framework to accelerate BWA-MEM with lower overhead with the help of the pipe operation in Spark. We additionally proposed to use a pipeline structure and in-memory-computation to accelerate PipeMEM. (3) Results: Our experiments showed that, on paired-end alignment tasks, our framework had low overhead. In a multi-node environment, our framework, on average, was 2.27× faster compared with BWASpark (an alignment tool in Genome Analysis Toolkit (GATK)), and 2.33× faster compared with SparkBWA. (4) Conclusions: PipeMEM could accelerate BWA-MEM in the Spark environment with high performance and low overhead.

摘要

(1)背景:DNA 序列比对过程是基因组分析的重要步骤。BWA-MEM 由于其速度快、准确性高,已成为流行的单节点基因组比对工具。目前,指数级生成的基因组数据需要多节点解决方案来处理大量数据,这仍然是一个挑战。Spark 是一种无处不在的大数据平台,已被用于协助基因组比对来应对这一挑战。然而,利用 Spark 优化 BWA-MEM 的现有工作存在较高的开销。(2)方法:在本文中,我们提出了 PipeMEM,这是一个在 Spark 的管道操作的帮助下加速 BWA-MEM 并降低开销的框架。我们还提出使用管道结构和内存计算来加速 PipeMEM。(3)结果:我们的实验表明,在处理配对末端比对任务时,我们的框架开销较低。在多节点环境中,与 BWASpark(基因组分析工具包(GATK)中的一种比对工具)相比,我们的框架平均快 2.27 倍,与 SparkBWA 相比快 2.33 倍。(4)结论:PipeMEM 可以在 Spark 环境中以高性能和低开销加速 BWA-MEM。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ca49/6896194/dc79ea031495/genes-10-00886-g001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验