Suppr超能文献

铂金谱系:遗传变异的长读长基准

The Platinum Pedigree: a long-read benchmark for genetic variants.

作者信息

Kronenberg Zev, Nolan Cillian, Porubsky David, Mokveld Tom, Rowell William J, Lee Sangjin, Dolzhenko Egor, Chang Pi-Chuan, Holt James M, Saunders Christopher T, Olson Nathan D, Steely Cody J, McGee Sean, Guarracino Andrea, Koundinya Nidhi, Harvey William T, Watkins W Scott, Munson Katherine M, Hoekzema Kendra, Chua Khi Pin, Chen Xiao, Fanslow Cairbre, Lambert Christine, Dashnow Harriet, Garrison Erik, Smith Joshua D, Lansdorp Peter M, Zook Justin M, Carroll Andrew, Jorde Lynn B, Neklason Deborah W, Quinlan Aaron R, Eichler Evan E, Eberle Michael A

机构信息

PacBio, Menlo Park, CA, USA.

Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA.

出版信息

Nat Methods. 2025 Aug;22(8):1669-1676. doi: 10.1038/s41592-025-02750-y. Epub 2025 Aug 4.

Abstract

Recent advances in genome sequencing have improved variant calling in complex regions of the human genome. However, it is difficult to quantify variant calling performance because existing standards often focus on specificity, neglecting completeness in difficult-to-analyze regions. To create a more comprehensive truth set, we used Mendelian inheritance in a large pedigree (CEPH-1463) to filter variants across PacBio high-fidelity (HiFi), Illumina and Oxford Nanopore Technologies platforms. This generated a variant map with over 4.7 million single-nucleotide variants, 767,795 insertions and deletions (indels), 537,486 tandem repeats and 24,315 structural variants, covering 2.77 Gb of the GRCh38 genome. This work adds ~200 Mb of high-confidence regions, including 8% more small variants, and introduces the first tandem repeat and structural variant truth sets for NA12878 and her family. As an example of the value of this improved benchmark, we retrained DeepVariant using these data to reduce genotyping errors by ~34%.

摘要

基因组测序的最新进展改进了人类基因组复杂区域的变异检测。然而,由于现有标准通常侧重于特异性,而忽略了难以分析区域的完整性,因此难以量化变异检测性能。为了创建一个更全面的真值集,我们利用一个大型家系(CEPH-1463)中的孟德尔遗传来筛选PacBio高保真(HiFi)、Illumina和牛津纳米孔技术平台上的变异。这生成了一个变异图谱,包含超过470万个单核苷酸变异、767,795个插入和缺失(indel)、537,486个串联重复以及24,315个结构变异,覆盖了GRCh38基因组的2.77Gb。这项工作增加了约200Mb的高置信度区域,包括多8%的小变异,并为NA12878及其家族引入了首个串联重复和结构变异真值集。作为这个改进基准价值的一个例子,我们使用这些数据重新训练了DeepVariant,将基因分型错误减少了约34%。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验