Department of Computer Science, Faculty of Computing and Information Technology, King Abdulaziz University, Jeddah, Saudi Arabia.
Center of Excellence in Genomic Medicine Research (CEGMR), King Abdulaziz University, Jeddah, Saudi Arabia.
PLoS One. 2023 Aug 3;18(8):e0288371. doi: 10.1371/journal.pone.0288371. eCollection 2023.
The next-generation sequencing (NGS) technology represents a significant advance in genomics and medical diagnosis. Nevertheless, the time it takes to perform sequencing, data analysis, and variant interpretation is a bottleneck in using next-generation sequencing in precision medicine. For accurate and efficient performance in clinical diagnostic lab practice, a consistent data analysis pipeline is necessary to avoid false variant calls and achieve optimum accuracy. This study aims to compare the performance of two NGS data analysis pipeline compartments, including short-read mapping (BWA-MEM and BWA-MEM2) and variant calling (GATK-HaplotypeCaller and DRAGEN-GATK). On Whole Exome Sequencing (WES) data, computational performance was assessed using several criteria, including mapping efficiency, variant calling performance, false positive calls rate, and time. We examined four gold-standard WES data sets: Ashkenazim father (NA24149), Ashkenazim mother (NA24143), Ashkenazim son (NA24385), and Asian son (NA25631). In addition, eighteen exome samples were analyzed based on different read counts, and coverage was used precisely in the run-time assessment. By using BWA-MEM 2 and Dragen-GATK, this study achieved faster and more accurate detection for SNVs and indels than the standard GATK Best Practices workflow. This systematic comparison will enable the bioinformatics community to develop a more efficient and faster solution for analyzing NGS data.
下一代测序(NGS)技术代表了基因组学和医学诊断的重大进展。然而,测序、数据分析和变异解释所花费的时间是将下一代测序应用于精准医学的一个瓶颈。为了在临床诊断实验室实践中实现准确和高效的性能,需要一个一致的数据分析管道,以避免假变异调用并实现最佳准确性。本研究旨在比较两种 NGS 数据分析管道组件的性能,包括短读映射(BWA-MEM 和 BWA-MEM2)和变异调用(GATK-HaplotypeCaller 和 DRAGEN-GATK)。在全外显子组测序(WES)数据上,使用多种标准评估了计算性能,包括映射效率、变异调用性能、假阳性调用率和时间。我们检查了四个金标准 WES 数据集:阿什肯纳齐父亲(NA24149)、阿什肯纳齐母亲(NA24143)、阿什肯纳齐儿子(NA24385)和亚洲儿子(NA25631)。此外,还根据不同的读取计数分析了十八个外显子样本,并在运行时评估中精确使用覆盖范围。通过使用 BWA-MEM 2 和 Dragen-GATK,本研究实现了比标准 GATK 最佳实践工作流程更快和更准确的 SNV 和插入缺失检测。这种系统比较将使生物信息学社区能够开发出更高效、更快的 NGS 数据分析解决方案。