基于 trio 测序的 GATK 和 DeepVariant 比较。

Comparison of GATK and DeepVariant by trio sequencing.

机构信息

Department of Medical Genetics, National Taiwan University Hospital, 8 Chung-Shan South Road, Taipei, 10041, Taiwan.

Google Inc., 1600 Amphitheatre Pkwy, Mountain View, CA, 94043, USA.

出版信息

Sci Rep. 2022 Feb 2;12(1):1809. doi: 10.1038/s41598-022-05833-4.

DOI:10.1038/s41598-022-05833-4

PMID:35110657

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8810758/

Abstract

While next-generation sequencing (NGS) has transformed genetic testing, it generates large quantities of noisy data that require a significant amount of bioinformatics to generate useful interpretation. The accuracy of variant calling is therefore critical. Although GATK HaplotypeCaller is a widely used tool for this purpose, newer methods such as DeepVariant have shown higher accuracy in assessments of gold-standard samples for whole-genome sequencing (WGS) and whole-exome sequencing (WES), but a side-by-side comparison on clinical samples has not been performed. Trio WES was used to compare GATK (4.1.2.0) HaplotypeCaller and DeepVariant (v0.8.0). The performance of the two pipelines was evaluated according to the Mendelian error rate, transition-to-transversion (Ti/Tv) ratio, concordance rate, and pathological variant detection rate. Data from 80 trios were analyzed. The Mendelian error rate of the 77 biological trios calculated from the data by DeepVariant (3.09 ± 0.83%) was lower than that calculated from the data by GATK (5.25 ± 0.91%) (p < 0.001). DeepVariant also yielded a higher Ti/Tv ratio (2.38 ± 0.02) than GATK (2.04 ± 0.07) (p < 0.001), suggesting that DeepVariant proportionally called more true positives. The concordance rate between the 2 pipelines was 88.73%. Sixty-three disease-causing variants were detected in the 80 trios. Among them, DeepVariant detected 62 variants, and GATK detected 61 variants. The one variant called by DeepVariant but not GATK HaplotypeCaller might have been missed by GATK HaplotypeCaller due to low coverage. OTC exon 2 (139 bp) deletion was not detected by either method. Mendelian error rate calculation is an effective way to evaluate variant callers. By this method, DeepVariant outperformed GATK, while the two pipelines performed equally in other parameters.

摘要

虽然下一代测序（NGS）改变了基因检测，但它生成了大量嘈杂的数据，需要大量的生物信息学来生成有用的解释。因此，变异调用的准确性至关重要。虽然 GATK HaplotypeCaller 是用于此目的的广泛使用的工具，但像 DeepVariant 这样的较新方法在全基因组测序（WGS）和全外显子组测序（WES）的金标准样本评估中显示出更高的准确性，但尚未在临床样本上进行并排比较。使用 Trio WES 比较 GATK（4.1.2.0）HaplotypeCaller 和 DeepVariant（v0.8.0）。根据 Mendelian 错误率、转换到颠换（Ti/Tv）比、一致性率和病理性变异检测率评估两个管道的性能。分析了 80 个三联体的数据。从 DeepVariant（3.09±0.83%）计算的 77 个生物三联体数据的 Mendelian 错误率低于 GATK（5.25±0.91%）（p<0.001）。DeepVariant 还产生了更高的 Ti/Tv 比（2.38±0.02）比 GATK（2.04±0.07）（p<0.001），表明 DeepVariant 成比例地调用了更多的真阳性。两个管道之间的一致性率为 88.73%。在 80 个三联体中检测到 63 种致病变体。其中，DeepVariant 检测到 62 个变体，GATK 检测到 61 个变体。DeepVariant 检测到但 GATK HaplotypeCaller 未检测到的一个变体可能由于覆盖度低而被 GATK HaplotypeCaller 错过。OTC 外显子 2（139bp）缺失未被任何方法检测到。孟德尔错误率计算是评估变异调用者的有效方法。通过这种方法，DeepVariant 优于 GATK，而两个管道在其他参数方面表现相当。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0946/8810758/dee66f1b2049/41598_2022_5833_Fig1_HTML.jpg

相似文献

Comparison of GATK and DeepVariant by trio sequencing.基于 trio 测序的 GATK 和 DeepVariant 比较。

Sci Rep. 2022 Feb 2;12(1):1809. doi: 10.1038/s41598-022-05833-4.

Variant callers for next-generation sequencing data: a comparison study.下一代测序数据的变异调用者：一项比较研究。

PLoS One. 2013 Sep 27;8(9):e75619. doi: 10.1371/journal.pone.0075619. eCollection 2013.

Comparison of calling pipelines for whole genome sequencing: an empirical study demonstrating the importance of mapping and alignment.比较全基因组测序的调用管道：一项实证研究表明映射和比对的重要性。

Sci Rep. 2022 Dec 13;12(1):21502. doi: 10.1038/s41598-022-26181-3.

Benchmarking variant callers in next-generation and third-generation sequencing analysis.在新一代和第三代测序分析中对变异调用程序进行基准测试。

Brief Bioinform. 2021 May 20;22(3). doi: 10.1093/bib/bbaa148.

Systematic benchmark of state-of-the-art variant calling pipelines identifies major factors affecting accuracy of coding sequence variant discovery.系统基准测试最先进的变异调用管道，确定影响编码序列变异发现准确性的主要因素。

BMC Genomics. 2022 Feb 22;23(1):155. doi: 10.1186/s12864-022-08365-3.

Accuracy and efficiency of germline variant calling pipelines for human genome data.人类基因组数据种系变异调用管道的准确性和效率。

Sci Rep. 2020 Nov 19;10(1):20222. doi: 10.1038/s41598-020-77218-4.

Performance assessment of variant calling pipelines using human whole exome sequencing and simulated data.使用人类全外显子组测序和模拟数据评估变异调用管道的性能。

BMC Bioinformatics. 2019 Jun 17;20(1):342. doi: 10.1186/s12859-019-2928-9.

Comparison of three variant callers for human whole genome sequencing.三种人类全基因组测序变异 caller 的比较。

Sci Rep. 2018 Dec 14;8(1):17851. doi: 10.1038/s41598-018-36177-7.

FVC as an adaptive and accurate method for filtering variants from popular NGS analysis pipelines.FVC 是一种自适应且准确的方法，可用于从流行的 NGS 分析管道中筛选变体。

Commun Biol. 2022 Sep 16;5(1):975. doi: 10.1038/s42003-022-03397-7.

ICR142 Benchmarker: evaluating, optimising and benchmarking variant calling performance using the ICR142 NGS validation series.ICR142基准测试工具：使用ICR142二代测序验证系列评估、优化和基准测试变异检测性能

Wellcome Open Res. 2018 Oct 31;3:108. doi: 10.12688/wellcomeopenres.14754.2. eCollection 2018.

引用本文的文献

Performance comparison of germline variant calling tools in sporadic disease cohorts.散发性疾病队列中种系变异检测工具的性能比较

Mol Genet Genomics. 2025 Sep 6;300(1):90. doi: 10.1007/s00438-025-02292-0.

Learning a refinement model for variant analysis in non-human primate genomes.学习用于非人灵长类动物基因组变异分析的优化模型。

BMC Genomics. 2025 Aug 25;26(1):775. doi: 10.1186/s12864-025-11921-2.

Learning-based parallel acceleration for HaplotypeCaller.基于学习的单倍型分型器并行加速技术

BMC Bioinformatics. 2025 Aug 20;26(1):217. doi: 10.1186/s12859-025-06242-w.

Identifying CDCA4 as a Radiotherapy Resistance-Associated Gene in Colorectal Cancer by an Integrated Bioinformatics Analysis Approach.通过综合生物信息学分析方法鉴定CDCA4为结直肠癌放疗抵抗相关基因。

Genes (Basel). 2025 Jun 9;16(6):696. doi: 10.3390/genes16060696.

Genetic Diversity and Population Structure of the Chinese Three-Keeled Pond Turtle ().中国三脊棱龟的遗传多样性与种群结构

Int J Mol Sci. 2025 Jun 11;26(12):5614. doi: 10.3390/ijms26125614.

Evidence for a transgenerational mutational signature from ionizing radiation exposure in humans.人类电离辐射暴露产生的跨代突变特征的证据。

Sci Rep. 2025 Jun 23;15(1):20262. doi: 10.1038/s41598-025-07030-5.

Overcoming limitations to customize DeepVariant for domesticated animals with TrioTrain.利用TrioTrain克服限制以定制适用于家养动物的DeepVariant。

Genome Res. 2025 Aug 1;35(8):1859-1874. doi: 10.1101/gr.279542.124.

Toward a Kinh Vietnamese Reference Genome: Constructing a De Novo Genome Assembly Using Long-Read Sequencing and Optical Mapping.迈向京族越南人参考基因组：利用长读长测序和光学图谱构建从头基因组组装

Genes (Basel). 2025 Apr 29;16(5):536. doi: 10.3390/genes16050536.

nf-core/pacvar: a pipeline for analyzing long-read PacBio whole genome and repeat expansion sequencing data.nf-core/pacvar：一个用于分析长读长PacBio全基因组和重复序列扩增测序数据的流程。

Bioinformatics. 2025 Mar 29;41(4). doi: 10.1093/bioinformatics/btaf116.

Case report: A case study of variant calling pipeline selection effect on the molecular diagnostics outcome.病例报告：变异检测流程选择对分子诊断结果影响的案例研究。

Front Oncol. 2024 Oct 31;14:1422811. doi: 10.3389/fonc.2024.1422811. eCollection 2024.

本文引用的文献

Accurate, scalable cohort variant calls using DeepVariant and GLnexus.使用DeepVariant和GLnexus进行准确、可扩展的队列变异检测。

Bioinformatics. 2021 Apr 5;36(24):5582-5589. doi: 10.1093/bioinformatics/btaa1081.

Systematic comparison of germline variant calling pipelines cross multiple next-generation sequencers.跨多种下一代测序仪的种系变异调用管道的系统比较。

Sci Rep. 2019 Jun 27;9(1):9345. doi: 10.1038/s41598-019-45835-3.

Performance assessment of variant calling pipelines using human whole exome sequencing and simulated data.使用人类全外显子组测序和模拟数据评估变异调用管道的性能。

BMC Bioinformatics. 2019 Jun 17;20(1):342. doi: 10.1186/s12859-019-2928-9.

Comparative Analysis for the Performance of Variant Calling Pipelines on Detecting the Mutations in Humans.变异检测流程在人类突变检测性能上的比较分析

Front Pharmacol. 2019 Apr 11;10:358. doi: 10.3389/fphar.2019.00358. eCollection 2019.

Comparison of three variant callers for human whole genome sequencing.三种人类全基因组测序变异 caller 的比较。

Sci Rep. 2018 Dec 14;8(1):17851. doi: 10.1038/s41598-018-36177-7.

A universal SNP and small-indel variant caller using deep neural networks.使用深度神经网络的通用 SNP 和小插入缺失变体调用器。

Nat Biotechnol. 2018 Nov;36(10):983-987. doi: 10.1038/nbt.4235. Epub 2018 Sep 24.

Standards and guidelines for the interpretation of sequence variants: a joint consensus recommendation of the American College of Medical Genetics and Genomics and the Association for Molecular Pathology.序列变异解读的标准与指南：美国医学遗传学与基因组学学会和分子病理学协会的联合共识推荐

Genet Med. 2015 May;17(5):405-24. doi: 10.1038/gim.2015.30. Epub 2015 Mar 5.

Improved variant calling accuracy by merging replicates in whole-exome sequencing studies.通过合并全外显子组测序研究中的重复样本提高变异检测准确性。

Biomed Res Int. 2014;2014:319534. doi: 10.1155/2014/319534. Epub 2014 Aug 4.

Validation and assessment of variant calling pipelines for next-generation sequencing.下一代测序变异检测流程的验证与评估

Hum Genomics. 2014 Jul 30;8(1):14. doi: 10.1186/1479-7364-8-14.

Low concordance of multiple variant-calling pipelines: practical implications for exome and genome sequencing.多种变异calling 管道一致性低：外显子组和基因组测序的实际影响。

Genome Med. 2013 Mar 27;5(3):28. doi: 10.1186/gm432. eCollection 2013.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

基于 trio 测序的 GATK 和 DeepVariant 比较。

Comparison of GATK and DeepVariant by trio sequencing.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献