• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

铂金谱系:遗传变异的长读长基准

The Platinum Pedigree: a long-read benchmark for genetic variants.

作者信息

Kronenberg Zev, Nolan Cillian, Porubsky David, Mokveld Tom, Rowell William J, Lee Sangjin, Dolzhenko Egor, Chang Pi-Chuan, Holt James M, Saunders Christopher T, Olson Nathan D, Steely Cody J, McGee Sean, Guarracino Andrea, Koundinya Nidhi, Harvey William T, Watkins W Scott, Munson Katherine M, Hoekzema Kendra, Chua Khi Pin, Chen Xiao, Fanslow Cairbre, Lambert Christine, Dashnow Harriet, Garrison Erik, Smith Joshua D, Lansdorp Peter M, Zook Justin M, Carroll Andrew, Jorde Lynn B, Neklason Deborah W, Quinlan Aaron R, Eichler Evan E, Eberle Michael A

机构信息

PacBio, Menlo Park, CA, USA.

Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA.

出版信息

Nat Methods. 2025 Aug;22(8):1669-1676. doi: 10.1038/s41592-025-02750-y. Epub 2025 Aug 4.

DOI:10.1038/s41592-025-02750-y
PMID:40759746
Abstract

Recent advances in genome sequencing have improved variant calling in complex regions of the human genome. However, it is difficult to quantify variant calling performance because existing standards often focus on specificity, neglecting completeness in difficult-to-analyze regions. To create a more comprehensive truth set, we used Mendelian inheritance in a large pedigree (CEPH-1463) to filter variants across PacBio high-fidelity (HiFi), Illumina and Oxford Nanopore Technologies platforms. This generated a variant map with over 4.7 million single-nucleotide variants, 767,795 insertions and deletions (indels), 537,486 tandem repeats and 24,315 structural variants, covering 2.77 Gb of the GRCh38 genome. This work adds ~200 Mb of high-confidence regions, including 8% more small variants, and introduces the first tandem repeat and structural variant truth sets for NA12878 and her family. As an example of the value of this improved benchmark, we retrained DeepVariant using these data to reduce genotyping errors by ~34%.

摘要

基因组测序的最新进展改进了人类基因组复杂区域的变异检测。然而,由于现有标准通常侧重于特异性,而忽略了难以分析区域的完整性,因此难以量化变异检测性能。为了创建一个更全面的真值集,我们利用一个大型家系(CEPH-1463)中的孟德尔遗传来筛选PacBio高保真(HiFi)、Illumina和牛津纳米孔技术平台上的变异。这生成了一个变异图谱,包含超过470万个单核苷酸变异、767,795个插入和缺失(indel)、537,486个串联重复以及24,315个结构变异,覆盖了GRCh38基因组的2.77Gb。这项工作增加了约200Mb的高置信度区域,包括多8%的小变异,并为NA12878及其家族引入了首个串联重复和结构变异真值集。作为这个改进基准价值的一个例子,我们使用这些数据重新训练了DeepVariant,将基因分型错误减少了约34%。

相似文献

1
The Platinum Pedigree: a long-read benchmark for genetic variants.铂金谱系:遗传变异的长读长基准
Nat Methods. 2025 Aug;22(8):1669-1676. doi: 10.1038/s41592-025-02750-y. Epub 2025 Aug 4.
2
HiFi long-read genomes for difficult-to-detect, clinically relevant variants.用于检测难以发现的临床相关变异的高保真长读长基因组。
Am J Hum Genet. 2025 Feb 6;112(2):450-456. doi: 10.1016/j.ajhg.2024.12.013. Epub 2025 Jan 13.
3
Highly accurate long reads are crucial for realizing the potential of biodiversity genomics.高质量的长读长序列对于实现生物多样性基因组学的潜力至关重要。
BMC Genomics. 2023 Mar 16;24(1):117. doi: 10.1186/s12864-023-09193-9.
4
High-coverage nanopore sequencing of samples from the 1000 Genomes Project to build a comprehensive catalog of human genetic variation.对 1000 基因组计划样本进行高覆盖度的纳米孔测序,构建人类遗传变异综合目录。
Genome Res. 2024 Nov 20;34(11):2061-2073. doi: 10.1101/gr.279273.124.
5
Prescription of Controlled Substances: Benefits and Risks管制药品的处方:益处与风险
6
A robust benchmark for detecting low-frequency variants in the HG002 Genome In A Bottle NIST reference material.用于检测基因组在瓶 NIST 参考材料 HG002 中低频变异的强大基准。
bioRxiv. 2024 Dec 5:2024.12.02.625685. doi: 10.1101/2024.12.02.625685.
7
NCBench: providing an open, reproducible, transparent, adaptable, and continuous benchmark approach for DNA-sequencing-based variant calling.NCBench:提供一种开放、可重复、透明、可适应和持续的基于 DNA 测序的变异调用基准方法。
F1000Res. 2024 Sep 12;12:1125. doi: 10.12688/f1000research.140344.1. eCollection 2023.
8
Whole-genome variant detection in long-read sequencing data from ultra-low input patient samples.超低输入量患者样本长读长测序数据中的全基因组变异检测
medRxiv. 2025 Jul 27:2025.07.25.25332067. doi: 10.1101/2025.07.25.25332067.
9
Analysis of targeted and whole genome sequencing of PacBio HiFi reads for a comprehensive genotyping of gene-proximal and phenotype-associated Variable Number Tandem Repeats.对PacBio HiFi reads进行靶向和全基因组测序分析,以全面基因分型基因近端和表型相关的可变数目串联重复序列。
PLoS Comput Biol. 2025 Apr 7;21(4):e1012885. doi: 10.1371/journal.pcbi.1012885. eCollection 2025 Apr.
10
Analysis and benchmarking of small and large genomic variants across tandem repeats.串联重复序列中小的和大的基因组变异的分析与基准测试。
Nat Biotechnol. 2025 Mar;43(3):431-442. doi: 10.1038/s41587-024-02225-z. Epub 2024 Apr 26.

引用本文的文献

1
Whole-genome variant detection in long-read sequencing data from ultra-low input patient samples.超低输入量患者样本长读长测序数据中的全基因组变异检测
medRxiv. 2025 Jul 27:2025.07.25.25332067. doi: 10.1101/2025.07.25.25332067.
2
Pangenome-aware DeepVariant.全基因组感知深度变异体
bioRxiv. 2025 Jun 6:2025.06.05.657102. doi: 10.1101/2025.06.05.657102.
3
GREGoR: Accelerating Genomics for Rare Diseases.GREGoR:加速罕见病基因组学研究

本文引用的文献

1
Human de novo mutation rates from a four-generation pedigree reference.基于一个四代家系参考得出的人类新生突变率。
Nature. 2025 Apr 23. doi: 10.1038/s41586-025-08922-2.
2
Sawfish: improving long-read structural variant discovery and genotyping with local haplotype modeling.锯鳐:利用局部单倍型建模改进长读长结构变异发现和基因分型
Bioinformatics. 2025 Mar 29;41(4). doi: 10.1093/bioinformatics/btaf136.
3
Genome-wide profiling of highly similar paralogous genes using HiFi sequencing.使用高保真测序对高度相似的旁系同源基因进行全基因组分析。
ArXiv. 2024 Dec 18:arXiv:2412.14338v1.
Nat Commun. 2025 Mar 8;16(1):2340. doi: 10.1038/s41467-025-57505-2.
4
Building pangenome graphs.构建泛基因组图谱。
Nat Methods. 2024 Nov;21(11):2008-2012. doi: 10.1038/s41592-024-02430-3. Epub 2024 Oct 21.
5
The GIAB genomic stratifications resource for human reference genomes.GIAB 基因组分层资源用于人类参考基因组。
Nat Commun. 2024 Oct 19;15(1):9029. doi: 10.1038/s41467-024-53260-y.
6
High-coverage nanopore sequencing of samples from the 1000 Genomes Project to build a comprehensive catalog of human genetic variation.对 1000 基因组计划样本进行高覆盖度的纳米孔测序,构建人类遗传变异综合目录。
Genome Res. 2024 Nov 20;34(11):2061-2073. doi: 10.1101/gr.279273.124.
7
Analysis and benchmarking of small and large genomic variants across tandem repeats.串联重复序列中小的和大的基因组变异的分析与基准测试。
Nat Biotechnol. 2025 Mar;43(3):431-442. doi: 10.1038/s41587-024-02225-z. Epub 2024 Apr 26.
8
Characterization and visualization of tandem repeats at genome scale.基因组水平上串联重复序列的特征化和可视化。
Nat Biotechnol. 2024 Oct;42(10):1606-1614. doi: 10.1038/s41587-023-02057-3. Epub 2024 Jan 2.
9
Detection of mosaic and population-level structural variants with Sniffles2.使用 Sniffles2 检测嵌合体和群体水平的结构变异。
Nat Biotechnol. 2024 Oct;42(10):1571-1580. doi: 10.1038/s41587-023-02024-y. Epub 2024 Jan 2.
10
Genomic variant benchmark: if you cannot measure it, you cannot improve it.基因组变异基准:如果无法衡量,就无法改进。
Genome Biol. 2023 Oct 5;24(1):221. doi: 10.1186/s13059-023-03061-1.