BWA-MEM 基因组短读序列比对的硬件加速方法研究：针对更长的读长。

Hardware acceleration of BWA-MEM genomic short read mapping for longer read lengths.

机构信息

Computer Engineering Lab, TU Delft, Mekelweg 4, 2628 CD Delft, The Netherlands; Bluebee, Laan van Zuid Hoorn 57, 2289 DC Rijswijk, The Netherlands.

Bluebee, Laan van Zuid Hoorn 57, 2289 DC Rijswijk, The Netherlands.

出版信息

Comput Biol Chem. 2018 Aug;75:54-64. doi: 10.1016/j.compbiolchem.2018.03.024. Epub 2018 Apr 12.

DOI:10.1016/j.compbiolchem.2018.03.024

PMID:29747076

Abstract

We present our work on hardware accelerated genomics pipelines, using either FPGAs or GPUs to accelerate execution of BWA-MEM, a widely-used algorithm for genomic short read mapping. The mapping stage can take up to 40% of overall processing time for genomics pipelines. Our implementation offloads the Seed Extension function, one of the main BWA-MEM computational functions, onto an accelerator. Sequencers typically output reads with a length of 150 base pairs. However, read length is expected to increase in the near future. Here, we investigate the influence of read length on BWA-MEM performance using data sets with read length up to 400 base pairs, and introduce methods to ameliorate the impact of longer read length. For the industry-standard 150 base pair read length, our implementation achieves an up to two-fold increase in overall application-level performance for systems with at most twenty-two logical CPU cores. Longer read length requires commensurately bigger data structures, which directly impacts accelerator efficiency. The two-fold performance increase is sustained for read length of at most 250 base pairs. To improve performance, we perform a classification of the inefficiency of the underlying systolic array architecture. By eliminating idle regions as much as possible, efficiency is improved by up to +95%. Moreover, adaptive load balancing intelligently distributes work between host and accelerator to ensure use of an accelerator always results in performance improvement, which in GPU-constrained scenarios provides up to +45% more performance.

摘要

我们介绍了使用 FPGA 或 GPU 加速 BWA-MEM 的硬件加速基因组学管道的工作，BWA-MEM 是一种广泛用于基因组短读映射的算法。映射阶段可能占基因组学管道总处理时间的 40%。我们的实现将 Seed Extension 功能（BWA-MEM 的主要计算功能之一）卸载到加速器上。测序仪通常输出长度为 150 个碱基对的读取。然而，读取长度预计在不久的将来会增加。在这里，我们使用长度达 400 个碱基对的数据集研究了读取长度对 BWA-MEM 性能的影响，并介绍了改善更长读取长度影响的方法。对于行业标准的 150 个碱基对读取长度，我们的实现对于最多具有二十二个逻辑 CPU 内核的系统，在整体应用程序级别性能方面提高了高达两倍。更长的读取长度需要相应更大的数据结构，这直接影响加速器的效率。在读取长度最大为 250 个碱基对的情况下，性能提高持续两倍。为了提高性能，我们对底层脉动阵列架构的效率低下进行了分类。通过尽可能消除空闲区域，效率提高了+95%。此外，自适应负载平衡智能地在主机和加速器之间分配工作，以确保使用加速器始终能提高性能，在 GPU 受限的情况下，性能提高了高达+45%。

相似文献

Hardware acceleration of BWA-MEM genomic short read mapping for longer read lengths.BWA-MEM 基因组短读序列比对的硬件加速方法研究：针对更长的读长。

Comput Biol Chem. 2018 Aug;75:54-64. doi: 10.1016/j.compbiolchem.2018.03.024. Epub 2018 Apr 12.

A hybrid short read mapping accelerator.一种混合短读映射加速器。

BMC Bioinformatics. 2013 Feb 26;14:67. doi: 10.1186/1471-2105-14-67.

Accelerating BWA-MEM Read Mapping on GPUs.在图形处理器上加速BWA-MEM读段比对

ICS. 2023 Jun;2023:155-166. doi: 10.1145/3577193.3593703. Epub 2023 Jun 21.

MICA: A fast short-read aligner that takes full advantage of Many Integrated Core Architecture (MIC).MICA：一种充分利用多核集成架构（MIC）的快速短读长比对工具。

BMC Bioinformatics. 2015;16 Suppl 7(Suppl 7):S10. doi: 10.1186/1471-2105-16-S7-S10. Epub 2015 Apr 23.

Arioc: High-concurrency short-read alignment on multiple GPUs.Arioc：在多个 GPU 上进行高并发性短读对齐。

PLoS Comput Biol. 2020 Nov 9;16(11):e1008383. doi: 10.1371/journal.pcbi.1008383. eCollection 2020 Nov.

Accelerating the Next Generation Long Read Mapping with the FPGA-Based System.利用基于现场可编程门阵列（FPGA）的系统加速下一代长读长映射

IEEE/ACM Trans Comput Biol Bioinform. 2014 Sep-Oct;11(5):840-52. doi: 10.1109/TCBB.2014.2326876.

MaxSSmap: a GPU program for mapping divergent short reads to genomes with the maximum scoring subsequence.MaxSSmap：一种用于通过最大得分子序列将发散短读段映射到基因组的GPU程序。

BMC Genomics. 2014 Nov 15;15(1):969. doi: 10.1186/1471-2164-15-969.

A nonvoxel-based dose convolution/superposition algorithm optimized for scalable GPU architectures.一种针对可扩展GPU架构进行优化的基于非体素的剂量卷积/叠加算法。

Med Phys. 2014 Oct;41(10):101711. doi: 10.1118/1.4895822.

BWA-MEME: BWA-MEM emulated with a machine learning approach.BWA-MEME：使用机器学习方法模拟的 BWA-MEM。

Bioinformatics. 2022 Apr 28;38(9):2404-2413. doi: 10.1093/bioinformatics/btac137.

FHAST: FPGA-Based Acceleration of Bowtie in Hardware.FHAST：基于现场可编程门阵列（FPGA）的硬件中蝴蝶结（Bowtie）加速

IEEE/ACM Trans Comput Biol Bioinform. 2015 Sep-Oct;12(5):973-81. doi: 10.1109/TCBB.2015.2405333.

引用本文的文献

Pig-to-human lung xenotransplantation into a brain-dead recipient.将猪肺异种移植到脑死亡受体体内。

Nat Med. 2025 Aug 25. doi: 10.1038/s41591-025-03861-x.

A novel loss-of-function variant causes asthenoteratozoospermia in infertile males.一种新的功能丧失变异导致不育男性出现弱畸精子症。

Front Genet. 2025 May 13;16:1595720. doi: 10.3389/fgene.2025.1595720. eCollection 2025.

Decoding Lusichelins A-E: An In-Depth Look at the Metallophores of LEGE 07167-Structure, Production, and Functionality.解析卢西西林A - E：深入探究LEGE 07167的金属载体——结构、生产及功能

J Nat Prod. 2025 Jun 27;88(6):1319-1333. doi: 10.1021/acs.jnatprod.5c00204. Epub 2025 May 21.

Whole Genome Insights into Genetic Diversity, Introgression, and Adaptation of Hunan Cattle.湖南黄牛遗传多样性、基因渐渗和适应性的全基因组洞察

Animals (Basel). 2025 Apr 30;15(9):1287. doi: 10.3390/ani15091287.

Identification of the crucial circ-mi-mRNA interaction networks regulating testicular development and spermatogenesis in ganders.鹅睾丸发育和精子发生过程中关键circ-miRNA-mRNA相互作用网络的鉴定

Poult Sci. 2025 Mar;104(3):104863. doi: 10.1016/j.psj.2025.104863. Epub 2025 Feb 1.

Comprehensive analysis of the first complete mitogenome and plastome of a traditional Chinese medicine Viola diffusa.对传统中药白花地丁首个完整线粒体基因组和叶绿体基因组的综合分析。

BMC Genomics. 2024 Dec 2;25(1):1162. doi: 10.1186/s12864-024-11086-4.

Genomic characterization of sp. and : bacterial strains isolated from soil present near electronics manufacture industry for heavy metal remediation.从电子制造行业附近土壤中分离出的用于重金属修复的[具体菌种名称1]和[具体菌种名称2]的基因组特征分析。（注：原文中sp.后面应该有具体菌种名称，这里统一用[具体菌种名称1]表示，同理第二个冒号后也应该有具体菌种名称，用[具体菌种名称2]表示）

Microbiol Resour Announc. 2024 Sep 10;13(9):e0061724. doi: 10.1128/mra.00617-24. Epub 2024 Aug 20.

A genomic basis of vocal rhythm in birds.鸟类发声节奏的基因组基础。

Nat Commun. 2024 Apr 23;15(1):3095. doi: 10.1038/s41467-024-47305-5.

Integrated bioinformatics analysis of retinal ischemia/reperfusion injury in rats with potential key genes.大鼠视网膜缺血/再灌注损伤的综合生物信息学分析及其潜在关键基因。

BMC Genomics. 2024 Apr 15;25(1):367. doi: 10.1186/s12864-024-10288-0.

Chromosomal mapping of a major genetic locus from Agropyron cristatum chromosome 6P that influences grain number and spikelet number in wheat.小麦穗粒数和小穗数的主效遗传位点的染色体定位：来自节节麦 6P 染色体的影响

Theor Appl Genet. 2024 Mar 15;137(4):82. doi: 10.1007/s00122-024-04584-2.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

BWA-MEM 基因组短读序列比对的硬件加速方法研究：针对更长的读长。

Hardware acceleration of BWA-MEM genomic short read mapping for longer read lengths.

机构信息

出版信息

相似文献

引用本文的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献