vcfdist：准确地对人类基因组中已分型的小型变异进行基准测试。

vcfdist: accurately benchmarking phased small variant calls in human genomes.

机构信息

Computer Science and Engineering, University of Michigan, 2260 Hayward Street, Ann Arbor, MI, 48109, USA.

出版信息

Nat Commun. 2023 Dec 9;14(1):8149. doi: 10.1038/s41467-023-43876-x.

DOI:10.1038/s41467-023-43876-x

PMID:38071244

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10710436/

Abstract

Accurately benchmarking small variant calling accuracy is critical for the continued improvement of human whole genome sequencing. In this work, we show that current variant calling evaluations are biased towards certain variant representations and may misrepresent the relative performance of different variant calling pipelines. We propose solutions, first exploring the affine gap parameter design space for complex variant representation and suggesting a standard. Next, we present our tool vcfdist and demonstrate the importance of enforcing local phasing for evaluation accuracy. We then introduce the notion of partial credit for mostly-correct calls and present an algorithm for clustering dependent variants. Lastly, we motivate using alignment distance metrics to supplement precision-recall curves for understanding variant calling performance. We evaluate the performance of 64 phased Truth Challenge V2 submissions and show that vcfdist improves measured insertion and deletion performance consistency across variant representations from R = 0.97243 for baseline vcfeval to 0.99996 for vcfdist.

摘要

准确地对小型变异调用准确性进行基准测试对于人类全基因组测序的持续改进至关重要。在这项工作中，我们表明，当前的变异调用评估偏向于某些变异表示，并且可能会对不同变异调用管道的相对性能产生误解。我们提出了一些解决方案，首先探索复杂变异表示的仿射间隙参数设计空间，并提出了一个标准。接下来，我们介绍了我们的工具 vcfdist，并演示了为评估准确性强制执行局部定相的重要性。然后，我们引入了部分信用的概念，用于主要正确的调用，并提出了一种用于聚类相关变体的算法。最后，我们提出使用对齐距离度量来补充精度-召回曲线，以了解变异调用性能。我们评估了 64 个分相 Truth Challenge V2 提交的性能，并表明 vcfdist 提高了从基线 vcfeval 的 R=0.97243 到 vcfdist 的 0.99996 的各种变异表示的插入和删除性能一致性的测量。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/60d9/10710436/fd517bbe2e4c/41467_2023_43876_Fig1_HTML.jpg

相似文献

vcfdist: accurately benchmarking phased small variant calls in human genomes.vcfdist：准确地对人类基因组中已分型的小型变异进行基准测试。

Nat Commun. 2023 Dec 9;14(1):8149. doi: 10.1038/s41467-023-43876-x.

Jointly benchmarking small and structural variant calls with vcfdist.使用 vcfdist 联合基准小型和结构变异调用。

Genome Biol. 2024 Oct 2;25(1):253. doi: 10.1186/s13059-024-03394-5.

Best practices for benchmarking germline small-variant calls in human genomes.人类基因组中小变异calls 的基准测试最佳实践。

Nat Biotechnol. 2019 May;37(5):555-560. doi: 10.1038/s41587-019-0054-x. Epub 2019 Mar 11.

Benchmarking workflows to assess performance and suitability of germline variant calling pipelines in clinical diagnostic assays.评估用于临床诊断检测的种系变异calling 管道性能和适用性的基准测试工作流程。

BMC Bioinformatics. 2021 Feb 24;22(1):85. doi: 10.1186/s12859-020-03934-3.

Benchmarking of variant calling software for whole-exome sequencing using gold standard datasets.使用金标准数据集对全外显子组测序的变异检测软件进行基准测试。

Sci Rep. 2025 Apr 21;15(1):13697. doi: 10.1038/s41598-025-97047-7.

Systematic benchmark of state-of-the-art variant calling pipelines identifies major factors affecting accuracy of coding sequence variant discovery.系统基准测试最先进的变异调用管道，确定影响编码序列变异发现准确性的主要因素。

BMC Genomics. 2022 Feb 22;23(1):155. doi: 10.1186/s12864-022-08365-3.

SMaSH: a benchmarking toolkit for human genome variant calling.SMaSH：一种用于人类基因组变异检测的基准测试工具包。

Bioinformatics. 2014 Oct;30(19):2787-95. doi: 10.1093/bioinformatics/btu345. Epub 2014 Jun 3.

Benchmarking reveals superiority of deep learning variant callers on bacterial nanopore sequence data.基准测试显示深度学习变异调用程序在细菌纳米孔测序数据上的优越性。

Elife. 2024 Oct 10;13:RP98300. doi: 10.7554/eLife.98300.

Combining accurate tumor genome simulation with crowdsourcing to benchmark somatic structural variant detection.结合精确的肿瘤基因组模拟和众包基准测试体细胞结构变异检测。

Genome Biol. 2018 Nov 6;19(1):188. doi: 10.1186/s13059-018-1539-5.

An open resource for accurately benchmarking small variant and reference calls.用于准确基准测试小型变体和参考调用的开放资源。

Nat Biotechnol. 2019 May;37(5):561-566. doi: 10.1038/s41587-019-0074-6. Epub 2019 Apr 1.

引用本文的文献

BVSim: A benchmarking variation simulator mimicking human variation spectrum.BVSim：一种模拟人类变异谱的基准变异模拟器。

Gigascience. 2025 Jan 6;14. doi: 10.1093/gigascience/giaf095.

Genome graphs reveal the importance of structural variation in evolution and drug resistance.基因组图谱揭示了结构变异在进化和耐药性中的重要性。

bioRxiv. 2025 May 7:2025.05.07.652570. doi: 10.1101/2025.05.07.652570.

Are reads required? High-precision variant calling from bacterial genome assemblies.是否需要读数？从细菌基因组组装中进行高精度变异检测。

Access Microbiol. 2025 May 28;7(5). doi: 10.1099/acmi.0.001025.v3. eCollection 2025.

Phasing nanopore genome assembly by integrating heterozygous variations and Hi-C data.通过整合杂合变异和Hi-C数据对纳米孔基因组进行定相组装。

Bioinformatics. 2024 Nov 28;40(12). doi: 10.1093/bioinformatics/btae712.

The GIAB genomic stratifications resource for human reference genomes.GIAB 基因组分层资源用于人类参考基因组。

Nat Commun. 2024 Oct 19;15(1):9029. doi: 10.1038/s41467-024-53260-y.

Benchmarking reveals superiority of deep learning variant callers on bacterial nanopore sequence data.基准测试显示深度学习变异调用程序在细菌纳米孔测序数据上的优越性。

Elife. 2024 Oct 10;13:RP98300. doi: 10.7554/eLife.98300.

Jointly benchmarking small and structural variant calls with vcfdist.使用 vcfdist 联合基准小型和结构变异调用。

Genome Biol. 2024 Oct 2;25(1):253. doi: 10.1186/s13059-024-03394-5.

CIEVaD: A Lightweight Workflow Collection for the Rapid and On-Demand Deployment of End-to-End Testing for Genomic Variant Detection.CIEVaD：一种用于快速和按需部署端到端基因组变异检测测试的轻量级工作流程集合。

Viruses. 2024 Sep 11;16(9):1444. doi: 10.3390/v16091444.

VCF observer: a user-friendly software tool for preliminary VCF file analysis and comparison.VCF 观察器：一个用户友好的软件工具，用于初步的 VCF 文件分析和比较。

BMC Bioinformatics. 2024 Sep 3;25(1):290. doi: 10.1186/s12859-024-05860-0.

Analysis and benchmarking of small and large genomic variants across tandem repeats.串联重复序列中小的和大的基因组变异的分析与基准测试。

Nat Biotechnol. 2025 Mar;43(3):431-442. doi: 10.1038/s41587-024-02225-z. Epub 2024 Apr 26.

本文引用的文献

nPoRe: n-polymer realigner for improved pileup-based variant calling.nPoRe：用于改进基于堆积的变异调用的 n-聚合物重排器。

BMC Bioinformatics. 2023 Mar 16;24(1):98. doi: 10.1186/s12859-023-05193-4.

Telomere-to-telomere assembly of diploid chromosomes with Verkko.利用 Verkko 进行二倍体染色体的端粒到端粒组装。

Nat Biotechnol. 2023 Oct;41(10):1474-1482. doi: 10.1038/s41587-023-01662-6. Epub 2023 Feb 16.

Optimal gap-affine alignment in O(s) space.最优间隙仿射对齐，时间复杂度为 O(s)。

Bioinformatics. 2023 Feb 3;39(2). doi: 10.1093/bioinformatics/btad074.

Benchmarking challenging small variants with linked and long reads.使用连锁读段和长读段对具有挑战性的小变异进行基准测试。

Cell Genom. 2022 May;2(5). doi: 10.1016/j.xgen.2022.100128.

Semi-automated assembly of high-quality diploid human reference genomes.半自动组装高质量的二倍体人类参考基因组。

Nature. 2022 Nov;611(7936):519-531. doi: 10.1038/s41586-022-05325-5. Epub 2022 Oct 19.

Clair3-trio: high-performance Nanopore long-read variant calling in family trios with trio-to-trio deep neural networks.Clair3-trio：使用三对三深度神经网络在家庭三对体中进行高性能纳米孔长读变异调用。

Brief Bioinform. 2022 Sep 20;23(5). doi: 10.1093/bib/bbac301.

PrecisionFDA Truth Challenge V2: Calling variants from short and long reads in difficult-to-map regions.精准FDA真相挑战V2：在难以映射的区域中从短读长和长读长中识别变异体。

Cell Genom. 2022 May 11;2(5). doi: 10.1016/j.xgen.2022.100129. Epub 2022 Apr 27.

The GA4GH Variation Representation Specification: A computational framework for variation representation and federated identification.GA4GH变异表示规范：变异表示与联合识别的计算框架。

Cell Genom. 2021 Nov 10;1(2). doi: 10.1016/j.xgen.2021.100027.

Curated variation benchmarks for challenging medically relevant autosomal genes.针对具有挑战性的医学相关常染色体基因的精选变异基准。

Nat Biotechnol. 2022 May;40(5):672-680. doi: 10.1038/s41587-021-01158-1. Epub 2022 Feb 7.

Twelve years of SAMtools and BCFtools.SAMtools 和 BCFtools 十二年。

Gigascience. 2021 Feb 16;10(2). doi: 10.1093/gigascience/giab008.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

vcfdist：准确地对人类基因组中已分型的小型变异进行基准测试。

vcfdist: accurately benchmarking phased small variant calls in human genomes.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献