Suppr超能文献

利用 NVIDIA Parabricks 加速基因组工作流程。

Accelerating genomic workflows using NVIDIA Parabricks.

机构信息

Health Data and AI, Deloitte Consulting LLP, VA, 22009, Arlington, USA.

Cloud Managed Services, Deloitte Consulting LLP, Detroit, MI, 48226, USA.

出版信息

BMC Bioinformatics. 2023 May 31;24(1):221. doi: 10.1186/s12859-023-05292-2.

Abstract

BACKGROUND

As genome sequencing becomes better integrated into scientific research, government policy, and personalized medicine, the primary challenge for researchers is shifting from generating raw data to analyzing these vast datasets. Although much work has been done to reduce compute times using various configurations of traditional CPU computing infrastructures, Graphics Processing Units (GPUs) offer opportunities to accelerate genomic workflows by orders of magnitude. Here we benchmark one GPU-accelerated software suite called NVIDIA Parabricks on Amazon Web Services (AWS), Google Cloud Platform (GCP), and an NVIDIA DGX cluster. We benchmarked six variant calling pipelines, including two germline callers (HaplotypeCaller and DeepVariant) and four somatic callers (Mutect2, Muse, LoFreq, SomaticSniper).

RESULTS

We achieved up to 65 × acceleration with germline variant callers, bringing HaplotypeCaller runtimes down from 36 h to 33 min on AWS, 35 min on GCP, and 24 min on the NVIDIA DGX. Somatic callers exhibited more variation between the number of GPUs and computing platforms. On cloud platforms, GPU-accelerated germline callers resulted in cost savings compared with CPU runs, whereas some somatic callers were more expensive than CPU runs because their GPU acceleration was not sufficient to overcome the increased GPU cost.

CONCLUSIONS

Germline variant callers scaled well with the number of GPUs across platforms, whereas somatic variant callers exhibited more variation in the number of GPUs with the fastest runtimes, suggesting that, at least with the version of Parabricks used here, these workflows are less GPU optimized and require benchmarking on the platform of choice before being deployed at production scales. Our study demonstrates that GPUs can be used to greatly accelerate genomic workflows, thus bringing closer to grasp urgent societal advances in the areas of biosurveillance and personalized medicine.

摘要

背景

随着基因组测序越来越多地融入科学研究、政府政策和个性化医疗,研究人员的主要挑战正从生成原始数据转变为分析这些庞大的数据集。尽管已经做了很多工作来使用各种传统 CPU 计算基础设施配置来减少计算时间,但图形处理单元 (GPU) 提供了通过数量级加速基因组工作流程的机会。在这里,我们在亚马逊网络服务 (AWS)、谷歌云平台 (GCP) 和 NVIDIA DGX 集群上对一个名为 NVIDIA Parabricks 的 GPU 加速软件套件进行了基准测试。我们对六个变体调用管道进行了基准测试,包括两个胚系调用器 (HaplotypeCaller 和 DeepVariant) 和四个体细胞调用器 (Mutect2、Muse、LoFreq 和 SomaticSniper)。

结果

我们使用胚系变体调用器实现了高达 65 倍的加速,使 HaplotypeCaller 的运行时间从 AWS 上的 36 小时减少到 33 分钟,GCP 上的 35 分钟,以及 NVIDIA DGX 上的 24 分钟。体细胞调用器在 GPU 数量和计算平台之间表现出更多的变化。在云平台上,与 CPU 运行相比,GPU 加速的胚系调用器可以节省成本,而一些体细胞调用器比 CPU 运行更昂贵,因为它们的 GPU 加速不足以克服增加的 GPU 成本。

结论

胚系变体调用器在跨平台的 GPU 数量上很好地扩展,而体细胞变体调用器在最快运行时间的 GPU 数量上表现出更多的变化,这表明,至少在我们使用的 Parabricks 版本中,这些工作流程不太适合 GPU 优化,并且在部署到生产规模之前,需要在所选平台上进行基准测试。我们的研究表明,GPU 可用于大大加速基因组工作流程,从而更接近掌握生物监测和个性化医疗等领域的紧迫社会进步。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6fca/10230726/d624950211ca/12859_2023_5292_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验