Samarakoon Pubudu Saneth, Fournous Ghislain, Hansen Lars T, Wijesiri Ashen, Zhao Sen, Alex A Rodriguez, Nandi Tarak Nath, Madduri Ravi, Rowe Alexander D, Thomassen Gard, Hovig Eivind, Razick Sabry
Scientific Computing Services, Division for Research, Dissemination and Education, University of Oslo, Oslo, 0373, Norway.
Norwegian National Unit for Newborn Screening, Division for Pediatric and Adolescent Medicine, Oslo University Hospital, Oslo, 0450, Norway.
Bioinform Adv. 2025 May 15;5(1):vbaf085. doi: 10.1093/bioadv/vbaf085. eCollection 2025.
Industry-standard central processing unit (CPU)-based next-generation sequencing (NGS) analysis tools have led to longer runtimes, affecting their utility in time-sensitive clinical practices and population-scale research studies. To address this, researchers have developed accelerated NGS platforms like DRAGEN and Parabricks, which have significantly reduced runtimes-from days to hours. However, these studies have evaluated accelerated platforms independently without sufficiently assessing computational resource usage or thoroughly investigating speedup scalability, a gap our study is designed to address.
Corroborating previous studies, accelerated pipelines demonstrated shorter runtimes than CPU-only approaches, with Parabricks-H100 demonstrating the highest speedups, followed by DRAGEN. In mapping, DRAGEN outperformed Parabricks (L4 and A100) and matched H100 speedups. Parabricks (A100 and H100) variant calling demonstrated higher speedups than DRAGEN. Moreover, DRAGEN and Parabricks-H100 mapping showed positive trends in the coverage-based scalability analysis, while other configurations failed to scale effectively. Our profiler analysis provided new insights into the relationships between Parabricks' performances and resource usage patterns, revealing its potential for further improvements. Our findings and cost comparison help researchers select accelerated platforms based on coverage needs, timeframes, and budget, while suggesting optimization strategies.
Datasets are described in the 'Data availability' section. Our NGS pipelines are available at https://github.com/NAICNO/accelerated_genomics.
基于行业标准中央处理器(CPU)的下一代测序(NGS)分析工具导致运行时间延长,影响了它们在对时间敏感的临床实践和大规模人群研究中的效用。为了解决这个问题,研究人员开发了如DRAGEN和Parabricks等加速NGS平台,这些平台显著缩短了运行时间——从数天缩短至数小时。然而,这些研究都是独立评估加速平台,没有充分评估计算资源的使用情况,也没有深入研究加速的可扩展性,我们的研究旨在填补这一空白。
与之前的研究一致,加速流程的运行时间比仅使用CPU的方法更短,其中Parabricks-H100的加速效果最佳,其次是DRAGEN。在比对方面,DRAGEN的表现优于Parabricks(L4和A100),且加速效果与H100相当。Parabricks(A100和H100)的变异检测加速效果高于DRAGEN。此外,DRAGEN和Parabricks-H100的比对在基于覆盖度的可扩展性分析中呈现出积极趋势,而其他配置未能有效扩展。我们的分析器分析为Parabricks的性能与资源使用模式之间的关系提供了新的见解,揭示了其进一步改进的潜力。我们的研究结果和成本比较有助于研究人员根据覆盖需求、时间框架和预算选择加速平台,同时提出优化策略。
数据集在“数据可用性”部分进行了描述。我们的NGS流程可在https://github.com/NAICNO/accelerated_genomics获取。