Suppr超能文献

vcfdist:准确地对人类基因组中已分型的小型变异进行基准测试。

vcfdist: accurately benchmarking phased small variant calls in human genomes.

机构信息

Computer Science and Engineering, University of Michigan, 2260 Hayward Street, Ann Arbor, MI, 48109, USA.

出版信息

Nat Commun. 2023 Dec 9;14(1):8149. doi: 10.1038/s41467-023-43876-x.

Abstract

Accurately benchmarking small variant calling accuracy is critical for the continued improvement of human whole genome sequencing. In this work, we show that current variant calling evaluations are biased towards certain variant representations and may misrepresent the relative performance of different variant calling pipelines. We propose solutions, first exploring the affine gap parameter design space for complex variant representation and suggesting a standard. Next, we present our tool vcfdist and demonstrate the importance of enforcing local phasing for evaluation accuracy. We then introduce the notion of partial credit for mostly-correct calls and present an algorithm for clustering dependent variants. Lastly, we motivate using alignment distance metrics to supplement precision-recall curves for understanding variant calling performance. We evaluate the performance of 64 phased Truth Challenge V2 submissions and show that vcfdist improves measured insertion and deletion performance consistency across variant representations from R = 0.97243 for baseline vcfeval to 0.99996 for vcfdist.

摘要

准确地对小型变异调用准确性进行基准测试对于人类全基因组测序的持续改进至关重要。在这项工作中,我们表明,当前的变异调用评估偏向于某些变异表示,并且可能会对不同变异调用管道的相对性能产生误解。我们提出了一些解决方案,首先探索复杂变异表示的仿射间隙参数设计空间,并提出了一个标准。接下来,我们介绍了我们的工具 vcfdist,并演示了为评估准确性强制执行局部定相的重要性。然后,我们引入了部分信用的概念,用于主要正确的调用,并提出了一种用于聚类相关变体的算法。最后,我们提出使用对齐距离度量来补充精度-召回曲线,以了解变异调用性能。我们评估了 64 个分相 Truth Challenge V2 提交的性能,并表明 vcfdist 提高了从基线 vcfeval 的 R=0.97243 到 vcfdist 的 0.99996 的各种变异表示的插入和删除性能一致性的测量。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/60d9/10710436/fd517bbe2e4c/41467_2023_43876_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验