• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

在图形处理器上加速BWA-MEM读段比对

Accelerating BWA-MEM Read Mapping on GPUs.

作者信息

Pham Minh, Tu Yicheng, Lv Xiaoyi

机构信息

University of South Florida, Tampa, FL, USA.

Xinjiang University, Ürümqi, China.

出版信息

ICS. 2023 Jun;2023:155-166. doi: 10.1145/3577193.3593703. Epub 2023 Jun 21.

DOI:10.1145/3577193.3593703
PMID:37584044
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10425913/
Abstract

Advancements in Next-Generation Sequencing (NGS) have significantly reduced the cost of generating DNA sequence data and increased the speed of data production. However, such high-throughput data production has increased the need for efficient data analysis programs. One of the most computationally demanding steps in analyzing sequencing data is mapping short reads produced by NGS to a reference DNA sequence, such as a human genome. The mapping program BWA-MEM and its newer version BWA-MEM2, optimized for CPUs, are some of the most popular choices for this task. In this study, we discuss the implementation of BWA-MEM on GPUs. This is a challenging task because many algorithms and data structures in BWA-MEM do not execute efficiently on the GPU architecture. This paper identifies major challenges in developing efficient GPU code on all major stages of the BWA-MEM program, including seeding, seed chaining, Smith-Waterman alignment, memory management, and I/O handling. We conduct comparison experiments against BWA-MEM and BWA-MEM2 running on a 64-thread CPU. The results show that our implementation achieved up to 3.2x speedup over BWA-MEM2 and up to 5.8x over BWA-MEM when using an NVIDIA A40. Using an NVIDIA A6000 and an NVIDIA A100, we achieved a wall-time speedup of up to 3.4x/3.8x over BWA-MEM2 and up to 6.1x/6.8x over BWA-MEM, respectively. In stage-wise comparison, the A40/A6000/A100 GPUs respectively achieved up to 3.7/3.8/4x, 2/2.3/2.5x, and 3.1/5/7.9x speedup on the three major stages of BWA-MEM: seeding and seed chaining, Smith-Waterman, and making SAM output. To the best of our knowledge, this is the first study that attempts to implement the entire BWA-MEM program on GPUs.

摘要

下一代测序(NGS)技术的进步显著降低了生成DNA序列数据的成本,并提高了数据生成速度。然而,这种高通量的数据生成增加了对高效数据分析程序的需求。在分析测序数据时,计算要求最高的步骤之一是将NGS产生的短读段映射到参考DNA序列,如人类基因组。针对CPU进行优化的映射程序BWA-MEM及其较新版本BWA-MEM2是这项任务中最受欢迎的选择之一。在本研究中,我们讨论了BWA-MEM在GPU上的实现。这是一项具有挑战性的任务,因为BWA-MEM中的许多算法和数据结构在GPU架构上无法高效执行。本文确定了在BWA-MEM程序的所有主要阶段开发高效GPU代码时面临的主要挑战,包括种子查找、种子链接、史密斯-沃特曼比对、内存管理和I/O处理。我们针对在64线程CPU上运行的BWA-MEM和BWA-MEM2进行了对比实验。结果表明,当使用NVIDIA A40时,我们的实现比BWA-MEM2加速了3.2倍,比BWA-MEM加速了5.8倍。使用NVIDIA A6000和NVIDIA A100时,我们分别比BWA-MEM2实现了高达3.4倍/3.8倍的墙钟时间加速,比BWA-MEM实现了高达6.1倍/6.8倍的加速。在逐阶段对比中,A40/A6000/A100 GPU在BWA-MEM的三个主要阶段:种子查找和种子链接、史密斯-沃特曼比对以及生成SAM输出上分别实现了高达3.7倍/3.8倍/4倍、2倍/2.3倍/2.5倍和3.1倍/5倍/7.9倍的加速。据我们所知,这是第一项尝试在GPU上实现整个BWA-MEM程序的研究。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/91bb/10425913/1a07b86dd1e8/nihms-1917800-f0011.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/91bb/10425913/89fe0630526a/nihms-1917800-f0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/91bb/10425913/a79216e4a44a/nihms-1917800-f0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/91bb/10425913/9cfb8d81bb00/nihms-1917800-f0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/91bb/10425913/40d79a2ee7c4/nihms-1917800-f0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/91bb/10425913/57b6dfceb93b/nihms-1917800-f0005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/91bb/10425913/e6a801d25f68/nihms-1917800-f0006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/91bb/10425913/0973f8952392/nihms-1917800-f0007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/91bb/10425913/1fc5c64c1e0b/nihms-1917800-f0008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/91bb/10425913/026b800ca909/nihms-1917800-f0009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/91bb/10425913/233227893dd7/nihms-1917800-f0010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/91bb/10425913/1a07b86dd1e8/nihms-1917800-f0011.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/91bb/10425913/89fe0630526a/nihms-1917800-f0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/91bb/10425913/a79216e4a44a/nihms-1917800-f0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/91bb/10425913/9cfb8d81bb00/nihms-1917800-f0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/91bb/10425913/40d79a2ee7c4/nihms-1917800-f0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/91bb/10425913/57b6dfceb93b/nihms-1917800-f0005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/91bb/10425913/e6a801d25f68/nihms-1917800-f0006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/91bb/10425913/0973f8952392/nihms-1917800-f0007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/91bb/10425913/1fc5c64c1e0b/nihms-1917800-f0008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/91bb/10425913/026b800ca909/nihms-1917800-f0009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/91bb/10425913/233227893dd7/nihms-1917800-f0010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/91bb/10425913/1a07b86dd1e8/nihms-1917800-f0011.jpg

相似文献

1
Accelerating BWA-MEM Read Mapping on GPUs.在图形处理器上加速BWA-MEM读段比对
ICS. 2023 Jun;2023:155-166. doi: 10.1145/3577193.3593703. Epub 2023 Jun 21.
2
BWA-MEME: BWA-MEM emulated with a machine learning approach.BWA-MEME:使用机器学习方法模拟的 BWA-MEM。
Bioinformatics. 2022 Apr 28;38(9):2404-2413. doi: 10.1093/bioinformatics/btac137.
3
Hardware acceleration of BWA-MEM genomic short read mapping for longer read lengths.BWA-MEM 基因组短读序列比对的硬件加速方法研究:针对更长的读长。
Comput Biol Chem. 2018 Aug;75:54-64. doi: 10.1016/j.compbiolchem.2018.03.024. Epub 2018 Apr 12.
4
Accelerating Minimap2 for Accurate Long Read Alignment on GPUs.在GPU上加速Minimap2以实现准确的长读长比对
J Biotechnol Biomed. 2023;6(1):13-23. doi: 10.26502/jbb.2642-91280067. Epub 2023 Jan 20.
5
Faster single-end alignment generation utilizing multi-thread for BWA.利用多线程实现更快的BWA单端比对生成。
Biomed Mater Eng. 2015;26 Suppl 1:S1791-6. doi: 10.3233/BME-151480.
6
Porting and Optimizing BWA-MEM2 Using the Fujitsu A64FX Processor.使用富士通A64FX处理器移植和优化BWA-MEM2
IEEE/ACM Trans Comput Biol Bioinform. 2023 Sep-Oct;20(5):3139-3153. doi: 10.1109/TCBB.2023.3264514. Epub 2023 Oct 9.
7
MICA: A fast short-read aligner that takes full advantage of Many Integrated Core Architecture (MIC).MICA:一种充分利用多核集成架构(MIC)的快速短读长比对工具。
BMC Bioinformatics. 2015;16 Suppl 7(Suppl 7):S10. doi: 10.1186/1471-2105-16-S7-S10. Epub 2015 Apr 23.
8
Evaluation of an optimized germline exomes pipeline using BWA-MEM2 and Dragen-GATK tools.使用 BWA-MEM2 和 Dragen-GATK 工具评估优化后的种系外显子组管道。
PLoS One. 2023 Aug 3;18(8):e0288371. doi: 10.1371/journal.pone.0288371. eCollection 2023.
9
PipeMEM: A Framework to Speed Up BWA-MEM in Spark with Low Overhead.PipeMEM:一种在 Spark 中使用低开销加速 BWA-MEM 的框架。
Genes (Basel). 2019 Nov 4;10(11):886. doi: 10.3390/genes10110886.
10
Accelerating Coupled-Cluster Calculations with GPUs: An Implementation of the Density-Fitted CCSD(T) Approach for Heterogeneous Computing Architectures Using OpenMP Directives.利用GPU加速耦合簇计算:一种使用OpenMP指令在异构计算架构上实现密度拟合CCSD(T)方法的方案
J Chem Theory Comput. 2023 Nov 14;19(21):7640-7657. doi: 10.1021/acs.jctc.3c00876. Epub 2023 Oct 25.

引用本文的文献

1
18 individual genes underwent variant screening in a northwest Chinese group comprised 83 probands diagnosed with early-onset high myopia.对一个由83名被诊断为早发性高度近视的先证者组成的中国西北人群中的18个个体基因进行了变异筛查。
PLoS One. 2025 Sep 8;20(9):e0329472. doi: 10.1371/journal.pone.0329472. eCollection 2025.
2
A Framework Integrating GWAS and Genomic Selection to Enhance Prediction Accuracy of Economical Traits in Common Carp.整合全基因组关联研究(GWAS)和基因组选择以提高鲤鱼经济性状预测准确性的框架
Int J Mol Sci. 2025 Jul 21;26(14):7009. doi: 10.3390/ijms26147009.
3
Detection of Selection Signatures and Genome-Wide Association Analysis of Body Weight Traits in Xianan Cattle.

本文引用的文献

1
DNAscan: personal computer compatible NGS analysis, annotation and visualisation.DNAscan:个人计算机兼容的 NGS 分析、注释和可视化。
BMC Bioinformatics. 2019 Apr 27;20(1):213. doi: 10.1186/s12859-019-2791-8.
2
Hardware acceleration of BWA-MEM genomic short read mapping for longer read lengths.BWA-MEM 基因组短读序列比对的硬件加速方法研究:针对更长的读长。
Comput Biol Chem. 2018 Aug;75:54-64. doi: 10.1016/j.compbiolchem.2018.03.024. Epub 2018 Apr 12.
3
Introducing difference recurrence relations for faster semi-global alignment of long sequences.
湘西黄牛选择信号检测及体重性状全基因组关联分析
Genes (Basel). 2025 May 30;16(6):682. doi: 10.3390/genes16060682.
4
Fast noisy long read alignment with multi-level parallelism.基于多级并行的快速噪声长读比对
BMC Bioinformatics. 2025 May 2;26(1):118. doi: 10.1186/s12859-025-06129-w.
5
BIRC3 RNA Editing Modulates Lipopolysaccharide-Induced Liver Inflammation: Potential Implications for Animal Health.BIRC3 RNA编辑调节脂多糖诱导的肝脏炎症:对动物健康的潜在影响
Int J Mol Sci. 2025 Mar 24;26(7):2941. doi: 10.3390/ijms26072941.
6
GRM1 as a Candidate Gene for Buffalo Fertility: Insights from Genome-Wide Association Studies and Its Role in the FOXO Signaling Pathway.GRM1作为水牛繁殖力的候选基因:全基因组关联研究的见解及其在FOXO信号通路中的作用
Genes (Basel). 2025 Feb 4;16(2):193. doi: 10.3390/genes16020193.
7
Genome-Wide Association Studies for Lactation Performance in Buffaloes.水牛泌乳性能的全基因组关联研究
Genes (Basel). 2025 Jan 27;16(2):163. doi: 10.3390/genes16020163.
8
Construction of AMPK-related circRNA network in mouse myocardial ischemia-reperfusion injury model.小鼠心肌缺血再灌注损伤模型中与AMPK相关的环状RNA网络的构建
BMC Cardiovasc Disord. 2024 Dec 30;24(1):759. doi: 10.1186/s12872-024-04387-9.
引入差异递归关系以加快长序列的半全局比对。
BMC Bioinformatics. 2018 Feb 19;19(Suppl 1):45. doi: 10.1186/s12859-018-2014-8.
4
Alignment of Next-Generation Sequencing Reads.下一代测序读数的比对
Annu Rev Genomics Hum Genet. 2015;16:133-51. doi: 10.1146/annurev-genom-090413-025358. Epub 2015 May 4.
5
Fast gapped-read alignment with Bowtie 2.快速缺口读对准与 Bowtie 2。
Nat Methods. 2012 Mar 4;9(4):357-9. doi: 10.1038/nmeth.1923.
6
Comparative analysis of algorithms for next-generation sequencing read alignment.下一代测序读段比对算法的比较分析。
Bioinformatics. 2011 Oct 15;27(20):2790-6. doi: 10.1093/bioinformatics/btr477. Epub 2011 Aug 19.
7
A survey of sequence alignment algorithms for next-generation sequencing.下一代测序序列比对算法综述。
Brief Bioinform. 2010 Sep;11(5):473-83. doi: 10.1093/bib/bbq015. Epub 2010 May 11.
8
The Sequence Alignment/Map format and SAMtools.序列比对/映射格式和 SAMtools。
Bioinformatics. 2009 Aug 15;25(16):2078-9. doi: 10.1093/bioinformatics/btp352. Epub 2009 Jun 8.
9
SOAP2: an improved ultrafast tool for short read alignment.SOAP2:一种用于短读序列比对的改进型超快速工具。
Bioinformatics. 2009 Aug 1;25(15):1966-7. doi: 10.1093/bioinformatics/btp336. Epub 2009 Jun 3.
10
Striped Smith-Waterman speeds database searches six times over other SIMD implementations.条纹史密斯-沃特曼算法在数据库搜索速度上比其他单指令多数据(SIMD)实现快六倍。
Bioinformatics. 2007 Jan 15;23(2):156-61. doi: 10.1093/bioinformatics/btl582. Epub 2006 Nov 16.