Department of Medical Genetics, National Taiwan University Hospital, 8 Chung-Shan South Road, Taipei, 10041, Taiwan.
Google Inc., 1600 Amphitheatre Pkwy, Mountain View, CA, 94043, USA.
Sci Rep. 2022 Feb 2;12(1):1809. doi: 10.1038/s41598-022-05833-4.
While next-generation sequencing (NGS) has transformed genetic testing, it generates large quantities of noisy data that require a significant amount of bioinformatics to generate useful interpretation. The accuracy of variant calling is therefore critical. Although GATK HaplotypeCaller is a widely used tool for this purpose, newer methods such as DeepVariant have shown higher accuracy in assessments of gold-standard samples for whole-genome sequencing (WGS) and whole-exome sequencing (WES), but a side-by-side comparison on clinical samples has not been performed. Trio WES was used to compare GATK (4.1.2.0) HaplotypeCaller and DeepVariant (v0.8.0). The performance of the two pipelines was evaluated according to the Mendelian error rate, transition-to-transversion (Ti/Tv) ratio, concordance rate, and pathological variant detection rate. Data from 80 trios were analyzed. The Mendelian error rate of the 77 biological trios calculated from the data by DeepVariant (3.09 ± 0.83%) was lower than that calculated from the data by GATK (5.25 ± 0.91%) (p < 0.001). DeepVariant also yielded a higher Ti/Tv ratio (2.38 ± 0.02) than GATK (2.04 ± 0.07) (p < 0.001), suggesting that DeepVariant proportionally called more true positives. The concordance rate between the 2 pipelines was 88.73%. Sixty-three disease-causing variants were detected in the 80 trios. Among them, DeepVariant detected 62 variants, and GATK detected 61 variants. The one variant called by DeepVariant but not GATK HaplotypeCaller might have been missed by GATK HaplotypeCaller due to low coverage. OTC exon 2 (139 bp) deletion was not detected by either method. Mendelian error rate calculation is an effective way to evaluate variant callers. By this method, DeepVariant outperformed GATK, while the two pipelines performed equally in other parameters.
虽然下一代测序(NGS)改变了基因检测,但它生成了大量嘈杂的数据,需要大量的生物信息学来生成有用的解释。因此,变异调用的准确性至关重要。虽然 GATK HaplotypeCaller 是用于此目的的广泛使用的工具,但像 DeepVariant 这样的较新方法在全基因组测序(WGS)和全外显子组测序(WES)的金标准样本评估中显示出更高的准确性,但尚未在临床样本上进行并排比较。使用 Trio WES 比较 GATK(4.1.2.0)HaplotypeCaller 和 DeepVariant(v0.8.0)。根据 Mendelian 错误率、转换到颠换(Ti/Tv)比、一致性率和病理性变异检测率评估两个管道的性能。分析了 80 个三联体的数据。从 DeepVariant(3.09±0.83%)计算的 77 个生物三联体数据的 Mendelian 错误率低于 GATK(5.25±0.91%)(p<0.001)。DeepVariant 还产生了更高的 Ti/Tv 比(2.38±0.02)比 GATK(2.04±0.07)(p<0.001),表明 DeepVariant 成比例地调用了更多的真阳性。两个管道之间的一致性率为 88.73%。在 80 个三联体中检测到 63 种致病变体。其中,DeepVariant 检测到 62 个变体,GATK 检测到 61 个变体。DeepVariant 检测到但 GATK HaplotypeCaller 未检测到的一个变体可能由于覆盖度低而被 GATK HaplotypeCaller 错过。OTC 外显子 2(139bp)缺失未被任何方法检测到。孟德尔错误率计算是评估变异调用者的有效方法。通过这种方法,DeepVariant 优于 GATK,而两个管道在其他参数方面表现相当。