利用 NVIDIA Parabricks 加速基因组工作流程。

Accelerating genomic workflows using NVIDIA Parabricks.

机构信息

Health Data and AI, Deloitte Consulting LLP, VA, 22009, Arlington, USA.

Cloud Managed Services, Deloitte Consulting LLP, Detroit, MI, 48226, USA.

出版信息

BMC Bioinformatics. 2023 May 31;24(1):221. doi: 10.1186/s12859-023-05292-2.

DOI:10.1186/s12859-023-05292-2

PMID:37259021

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10230726/

Abstract

BACKGROUND

As genome sequencing becomes better integrated into scientific research, government policy, and personalized medicine, the primary challenge for researchers is shifting from generating raw data to analyzing these vast datasets. Although much work has been done to reduce compute times using various configurations of traditional CPU computing infrastructures, Graphics Processing Units (GPUs) offer opportunities to accelerate genomic workflows by orders of magnitude. Here we benchmark one GPU-accelerated software suite called NVIDIA Parabricks on Amazon Web Services (AWS), Google Cloud Platform (GCP), and an NVIDIA DGX cluster. We benchmarked six variant calling pipelines, including two germline callers (HaplotypeCaller and DeepVariant) and four somatic callers (Mutect2, Muse, LoFreq, SomaticSniper).

RESULTS

We achieved up to 65 × acceleration with germline variant callers, bringing HaplotypeCaller runtimes down from 36 h to 33 min on AWS, 35 min on GCP, and 24 min on the NVIDIA DGX. Somatic callers exhibited more variation between the number of GPUs and computing platforms. On cloud platforms, GPU-accelerated germline callers resulted in cost savings compared with CPU runs, whereas some somatic callers were more expensive than CPU runs because their GPU acceleration was not sufficient to overcome the increased GPU cost.

CONCLUSIONS

Germline variant callers scaled well with the number of GPUs across platforms, whereas somatic variant callers exhibited more variation in the number of GPUs with the fastest runtimes, suggesting that, at least with the version of Parabricks used here, these workflows are less GPU optimized and require benchmarking on the platform of choice before being deployed at production scales. Our study demonstrates that GPUs can be used to greatly accelerate genomic workflows, thus bringing closer to grasp urgent societal advances in the areas of biosurveillance and personalized medicine.

摘要

背景

随着基因组测序越来越多地融入科学研究、政府政策和个性化医疗，研究人员的主要挑战正从生成原始数据转变为分析这些庞大的数据集。尽管已经做了很多工作来使用各种传统 CPU 计算基础设施配置来减少计算时间，但图形处理单元 (GPU) 提供了通过数量级加速基因组工作流程的机会。在这里，我们在亚马逊网络服务 (AWS)、谷歌云平台 (GCP) 和 NVIDIA DGX 集群上对一个名为 NVIDIA Parabricks 的 GPU 加速软件套件进行了基准测试。我们对六个变体调用管道进行了基准测试，包括两个胚系调用器 (HaplotypeCaller 和 DeepVariant) 和四个体细胞调用器 (Mutect2、Muse、LoFreq 和 SomaticSniper)。

结果

我们使用胚系变体调用器实现了高达 65 倍的加速，使 HaplotypeCaller 的运行时间从 AWS 上的 36 小时减少到 33 分钟，GCP 上的 35 分钟，以及 NVIDIA DGX 上的 24 分钟。体细胞调用器在 GPU 数量和计算平台之间表现出更多的变化。在云平台上，与 CPU 运行相比，GPU 加速的胚系调用器可以节省成本，而一些体细胞调用器比 CPU 运行更昂贵，因为它们的 GPU 加速不足以克服增加的 GPU 成本。

结论

胚系变体调用器在跨平台的 GPU 数量上很好地扩展，而体细胞变体调用器在最快运行时间的 GPU 数量上表现出更多的变化，这表明，至少在我们使用的 Parabricks 版本中，这些工作流程不太适合 GPU 优化，并且在部署到生产规模之前，需要在所选平台上进行基准测试。我们的研究表明，GPU 可用于大大加速基因组工作流程，从而更接近掌握生物监测和个性化医疗等领域的紧迫社会进步。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6fca/10230726/d624950211ca/12859_2023_5292_Fig1_HTML.jpg

相似文献

Accelerating genomic workflows using NVIDIA Parabricks.

BMC Bioinformatics. 2023 May 31;24(1):221. doi: 10.1186/s12859-023-05292-2.

A hybrid computational strategy to address WGS variant analysis in >5000 samples.

BMC Bioinformatics. 2016 Sep 10;17(1):361. doi: 10.1186/s12859-016-1211-6.

Accelerating Minimap2 for Accurate Long Read Alignment on GPUs.

J Biotechnol Biomed. 2023;6(1):13-23. doi: 10.26502/jbb.2642-91280067. Epub 2023 Jan 20.

Benchmarking variant callers in next-generation and third-generation sequencing analysis.

Brief Bioinform. 2021 May 20;22(3). doi: 10.1093/bib/bbaa148.

Accelerating Coupled-Cluster Calculations with GPUs: An Implementation of the Density-Fitted CCSD(T) Approach for Heterogeneous Computing Architectures Using OpenMP Directives.

J Chem Theory Comput. 2023 Nov 14;19(21):7640-7657. doi: 10.1021/acs.jctc.3c00876. Epub 2023 Oct 25.

Multi-GPU Jacobian accelerated computing for soft-field tomography.

Physiol Meas. 2012 Oct;33(10):1703-15. doi: 10.1088/0967-3334/33/10/1703. Epub 2012 Sep 26.

Large-scale neural circuit mapping data analysis accelerated with the graphical processing unit (GPU).

J Neurosci Methods. 2015 Jan 15;239:1-10. doi: 10.1016/j.jneumeth.2014.09.022. Epub 2014 Sep 30.

GPU-Acceleration of Sequence Homology Searches with Database Subsequence Clustering.

PLoS One. 2016 Aug 2;11(8):e0157338. doi: 10.1371/journal.pone.0157338. eCollection 2016.

Fast on-site Monte Carlo tool for dose calculations in CT applications.

Med Phys. 2012 Jun;39(6):2985-96. doi: 10.1118/1.4711748.

A graphical, interactive and GPU-enabled workflow to process long-read sequencing data.

BMC Genomics. 2021 Aug 23;22(1):626. doi: 10.1186/s12864-021-07927-1.

引用本文的文献

Learning-based parallel acceleration for HaplotypeCaller.

BMC Bioinformatics. 2025 Aug 20;26(1):217. doi: 10.1186/s12859-025-06242-w.

Age and early life adversity shape heterogeneity of the epigenome across tissues in macaques.

bioRxiv. 2025 Jul 18:2025.07.13.664445. doi: 10.1101/2025.07.13.664445.

Digital Alchemy: The Rise of Machine and Deep Learning in Small-Molecule Drug Discovery.

Int J Mol Sci. 2025 Jul 16;26(14):6807. doi: 10.3390/ijms26146807.

Benchmarking of feed-forward neural network models for genomic prediction of quantitative traits in pigs.

Front Genet. 2025 Jun 18;16:1618891. doi: 10.3389/fgene.2025.1618891. eCollection 2025.

Heimler Syndrome With Tooth Agenesis, Abnormal Enamel and Dentin Mineralization, Root Maldevelopment, and PEX1 Mutation.

Int Dent J. 2025 Jun 3;75(4):100821. doi: 10.1016/j.identj.2025.04.002.

Benchmarking accelerated next-generation sequencing analysis pipelines.

Bioinform Adv. 2025 May 15;5(1):vbaf085. doi: 10.1093/bioadv/vbaf085. eCollection 2025.

Prospective, multicenter validation of a platform for rapid molecular profiling of central nervous system tumors.

Nat Med. 2025 May;31(5):1567-1577. doi: 10.1038/s41591-025-03562-5. Epub 2025 Mar 25.

Fast and accurate DNASeq variant calling workflow composed of LUSH toolkit.

Hum Genomics. 2024 Oct 10;18(1):114. doi: 10.1186/s40246-024-00666-w.

CloudATAC: a cloud-based framework for ATAC-Seq data analysis.

Brief Bioinform. 2024 Jul 23;25(Supplement_1). doi: 10.1093/bib/bbae090.

Sequencing technologies and hardware-accelerated parallel computing transform computational genomics research.

Front Bioinform. 2024 Mar 19;4:1384497. doi: 10.3389/fbinf.2024.1384497. eCollection 2024.

本文引用的文献

Applied genomics for identification of virulent biothreats and for disease outbreak surveillance.

Postgrad Med J. 2023 Jun 8;99(1171):403-410. doi: 10.1136/postgradmedj-2021-139916.

Serverless computing in omics data analysis and integration.

Brief Bioinform. 2022 Jan 17;23(1). doi: 10.1093/bib/bbab349.

Cloud Computing Enabled Big Multi-Omics Data Analytics.

Bioinform Biol Insights. 2021 Jul 28;15:11779322211035921. doi: 10.1177/11779322211035921. eCollection 2021.

Sustainable data analysis with Snakemake.

F1000Res. 2021 Jan 18;10:33. doi: 10.12688/f1000research.29032.2. eCollection 2021.

Hummingbird: efficient performance prediction for executing genomic applications in the cloud.

Bioinformatics. 2021 Sep 9;37(17):2537-2543. doi: 10.1093/bioinformatics/btab161.

SomatoSim: precision simulation of somatic single nucleotide variants.

BMC Bioinformatics. 2021 Mar 6;22(1):109. doi: 10.1186/s12859-021-04024-8.

Perspectives of using Cloud computing in integrative analysis of multi-omics data.

Brief Funct Genomics. 2021 Jul 17;20(4):198-206. doi: 10.1093/bfgp/elab007.

Accuracy and efficiency of germline variant calling pipelines for human genome data.

Sci Rep. 2020 Nov 19;10(1):20222. doi: 10.1038/s41598-020-77218-4.

Practical guide for managing large-scale human genome data in research.

J Hum Genet. 2021 Jan;66(1):39-52. doi: 10.1038/s10038-020-00862-1. Epub 2020 Oct 23.

Scalability and cost-effectiveness analysis of whole genome-wide association studies on Google Cloud Platform and Amazon Web Services.

J Am Med Inform Assoc. 2020 Sep 1;27(9):1425-1430. doi: 10.1093/jamia/ocaa068.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

利用 NVIDIA Parabricks 加速基因组工作流程。

Accelerating genomic workflows using NVIDIA Parabricks.

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSIONS

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献