Suppr超能文献

加速下一代测序数据分析:对基因组分析工具包算法优化最佳实践的评估

Accelerating next generation sequencing data analysis: an evaluation of optimized best practices for Genome Analysis Toolkit algorithms.

作者信息

Franke Karl R, Crowgey Erin L

机构信息

Department of Pediatrics, Nemours Alfred I duPont Hospital for Children, Wilmington, DE 19803, USA.

出版信息

Genomics Inform. 2020 Mar;18(1):e10. doi: 10.5808/GI.2020.18.1.e10. Epub 2020 Mar 31.

Abstract

Advancements in next generation sequencing (NGS) technologies have significantly increased the translational use of genomics data in the medical field as well as the demand for computational infrastructure capable processing that data. To enhance the current understanding of software and hardware used to compute large scale human genomic datasets (NGS), the performance and accuracy of optimized versions of GATK algorithms, including Parabricks and Sentieon, were compared to the results of the original application (GATK V4.1.0, Intel x86 CPUs). Parabricks was able to process a 50× whole-genome sequencing library in under 3 h and Sentieon finished in under 8 h, whereas GATK v4.1.0 needed nearly 24 h. These results were achieved while maintaining greater than 99% accuracy and precision compared to stock GATK. Sentieon's somatic pipeline achieved similar results greater than 99%. Additionally, the IBM POWER9 CPU performed well on bioinformatic workloads when tested with 10 different tools for alignment/mapping.

摘要

下一代测序(NGS)技术的进步显著增加了基因组学数据在医学领域的转化应用,以及对能够处理这些数据的计算基础设施的需求。为了加深对用于计算大规模人类基因组数据集(NGS)的软件和硬件的当前理解,将包括Parabricks和Sentieon在内的GATK算法优化版本的性能和准确性与原始应用程序(GATK V4.1.0,英特尔x86 CPU)的结果进行了比较。Parabricks能够在3小时内处理一个50倍覆盖度的全基因组测序文库,Sentieon在8小时内完成,而GATK v4.1.0则需要近24小时。与原始GATK相比,在保持大于99%的准确性和精确性的同时取得了这些结果。Sentieon的体细胞分析流程也取得了大于99%的类似结果。此外,当使用10种不同的比对/映射工具进行测试时,IBM POWER9 CPU在生物信息学工作负载上表现良好。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e83b/7120354/2441d9c0c652/gi-2020-18-1-e10f1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验