• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

个性化泛基因组参考序列。

Personalized pangenome references.

机构信息

UC Santa Cruz Genomics Institute, University of California, Santa Cruz, CA, USA.

University of Ferrara, Ferrara, Italy.

出版信息

Nat Methods. 2024 Nov;21(11):2017-2023. doi: 10.1038/s41592-024-02407-2. Epub 2024 Sep 11.

DOI:10.1038/s41592-024-02407-2
PMID:39261641
Abstract

Pangenomes reduce reference bias by representing genetic diversity better than a single reference sequence. Yet when comparing a sample to a pangenome, variants in the pangenome that are not part of the sample can be misleading, for example, causing false read mappings. These irrelevant variants are generally rarer in terms of allele frequency, and have previously been dealt with by filtering rare variants. However, this blunt heuristic both fails to remove some irrelevant variants and removes many relevant variants. We propose a new approach that imputes a personalized pangenome subgraph by sampling local haplotypes according to k-mer counts in the reads. We implement the approach in the vg toolkit ( https://github.com/vgteam/vg ) for the Giraffe short-read aligner and compare its accuracy to state-of-the-art methods using human pangenome graphs from the Human Pangenome Reference Consortium. This reduces small variant genotyping errors by four times relative to the Genome Analysis Toolkit and makes short-read structural variant genotyping of known variants competitive with long-read variant discovery methods.

摘要

泛基因组通过更好地代表遗传多样性来减少参考偏差,而不是单一的参考序列。然而,当将样本与泛基因组进行比较时,泛基因组中不属于样本的变体可能会产生误导,例如导致假读映射。这些不相关的变体通常在等位基因频率方面较少见,并且以前已经通过过滤稀有变体来处理。然而,这种简单的启发式方法既不能去除一些不相关的变体,也不能去除许多相关的变体。我们提出了一种新的方法,通过根据读取中的 k-mer 计数对局部单倍型进行采样,来推断个性化的泛基因组子图。我们在 Giraffe 短读对齐器的 vg 工具包(https://github.com/vgteam/vg)中实现了该方法,并使用人类泛基因组参考联盟的人类泛基因组图谱来比较其准确性与最先进的方法。与基因组分析工具包相比,这将小变体基因分型错误减少了四倍,并使已知变体的短读结构变体基因分型与长读变体发现方法具有竞争力。

相似文献

1
Personalized pangenome references.个性化泛基因组参考序列。
Nat Methods. 2024 Nov;21(11):2017-2023. doi: 10.1038/s41592-024-02407-2. Epub 2024 Sep 11.
2
Personalized Pangenome References.个性化全基因组参考序列。
bioRxiv. 2023 Dec 15:2023.12.13.571553. doi: 10.1101/2023.12.13.571553.
3
Pangenomics enables genotyping of known structural variants in 5202 diverse genomes.泛基因组学能够对 5202 个不同基因组中的已知结构变异进行基因分型。
Science. 2021 Dec 17;374(6574):abg8871. doi: 10.1126/science.abg8871.
4
Pangenome graph construction from genome alignments with Minigraph-Cactus.基于 Minigraph-Cactus 的基因组比对构建泛基因组图谱。
Nat Biotechnol. 2024 Apr;42(4):663-673. doi: 10.1038/s41587-023-01793-w. Epub 2023 May 10.
5
Chaining for accurate alignment of erroneous long reads to acyclic variation graphs.基于无环变异图的错误长读精确比对链。
Bioinformatics. 2023 Aug 1;39(8). doi: 10.1093/bioinformatics/btad460.
6
Enhancing SNV identification in whole-genome sequencing data through the incorporation of known genetic variants into the minimap2 index.通过将已知遗传变异纳入 minimap2 索引来提高全基因组测序数据中 SNV 的识别能力。
BMC Bioinformatics. 2024 Jul 13;25(1):238. doi: 10.1186/s12859-024-05862-y.
7
Efficient short read mapping to a pangenome that is represented by a graph of ED strings.高效的短读映射到由 ED 字符串图表示的泛基因组。
Bioinformatics. 2023 May 4;39(5). doi: 10.1093/bioinformatics/btad320.
8
Pangenome-based genome inference allows efficient and accurate genotyping across a wide spectrum of variant classes.基于泛基因组的基因组推断可在广泛的变异类别中实现高效、准确的基因分型。
Nat Genet. 2022 Apr;54(4):518-525. doi: 10.1038/s41588-022-01043-w. Epub 2022 Apr 11.
9
VeChat: correcting errors in long reads using variation graphs.VeChat:使用变异图谱纠正长读中的错误。
Nat Commun. 2022 Nov 4;13(1):6657. doi: 10.1038/s41467-022-34381-8.
10
Comparing methods for constructing and representing human pangenome graphs.比较构建和表示人类泛基因组图的方法。
Genome Biol. 2023 Nov 30;24(1):274. doi: 10.1186/s13059-023-03098-2.

引用本文的文献

1
Population health management genomic new-born screens and multi-omics intercepts.人群健康管理、基因组新生儿筛查与多组学交叉研究。
Front Artif Intell. 2025 Jul 29;7:1496942. doi: 10.3389/frai.2024.1496942. eCollection 2024.
2
Phased genome assemblies and pangenome graphs of human populations of Japan and Saudi Arabia.日本和沙特阿拉伯人群的阶段性基因组组装和泛基因组图谱。
Sci Data. 2025 Aug 12;12(1):1316. doi: 10.1038/s41597-025-05652-y.
3
Pangenome-aware DeepVariant.全基因组感知深度变异体

本文引用的文献

1
Scalable Nanopore sequencing of human genomes provides a comprehensive view of haplotype-resolved variation and methylation.可扩展的纳米孔测序技术对人类基因组进行测序,提供了全面的单倍型分辨率变异和甲基化视图。
Nat Methods. 2023 Oct;20(10):1483-1492. doi: 10.1038/s41592-023-01993-x. Epub 2023 Sep 14.
2
A draft human pangenome reference.人类泛基因组参考草图。
Nature. 2023 May;617(7960):312-324. doi: 10.1038/s41586-023-05896-x. Epub 2023 May 10.
3
Benchmarking challenging small variants with linked and long reads.使用连锁读段和长读段对具有挑战性的小变异进行基准测试。
bioRxiv. 2025 Jun 6:2025.06.05.657102. doi: 10.1101/2025.06.05.657102.
4
Pangenome graph mitigates heterozygosity overestimation from mapping bias: a case study in Chinese indigenous pigs.泛基因组图谱减轻了因映射偏差导致的杂合度高估:以中国本土猪为例的研究
BMC Biol. 2025 Mar 26;23(1):89. doi: 10.1186/s12915-025-02194-y.
5
Long and Accurate: How HiFi Sequencing is Transforming Genomics.长读长且准确:高保真测序如何改变基因组学
Genomics Proteomics Bioinformatics. 2025 May 10;23(1). doi: 10.1093/gpbjnl/qzaf003.
6
Integer programming framework for pangenome-based genome inference.基于泛基因组的基因组推断的整数规划框架
bioRxiv. 2024 Oct 29:2024.10.27.620212. doi: 10.1101/2024.10.27.620212.
7
Efficient indexing and querying of annotations in a pangenome graph.泛基因组图中注释的高效索引与查询
bioRxiv. 2024 Oct 15:2024.10.12.618009. doi: 10.1101/2024.10.12.618009.
8
Cluster-efficient pangenome graph construction with nf-core/pangenome.使用 nf-core/pangenome 进行高效聚类的泛基因组图构建。
Bioinformatics. 2024 Nov 1;40(11). doi: 10.1093/bioinformatics/btae609.
Cell Genom. 2022 May;2(5). doi: 10.1016/j.xgen.2022.100128.
4
GBZ file format for pangenome graphs.GBZ 文件格式用于泛基因组图谱。
Bioinformatics. 2022 Nov 15;38(22):5012-5018. doi: 10.1093/bioinformatics/btac656.
5
The K-mer File Format: a standardized and compact disk representation of sets of k-mers.K-mer 文件格式:一种用于表示 K-mer 集合的标准化、紧凑的磁盘表示形式。
Bioinformatics. 2022 Sep 15;38(18):4423-4425. doi: 10.1093/bioinformatics/btac528.
6
Pangenome-based genome inference allows efficient and accurate genotyping across a wide spectrum of variant classes.基于泛基因组的基因组推断可在广泛的变异类别中实现高效、准确的基因分型。
Nat Genet. 2022 Apr;54(4):518-525. doi: 10.1038/s41588-022-01043-w. Epub 2022 Apr 11.
7
Curated variation benchmarks for challenging medically relevant autosomal genes.针对具有挑战性的医学相关常染色体基因的精选变异基准。
Nat Biotechnol. 2022 May;40(5):672-680. doi: 10.1038/s41587-021-01158-1. Epub 2022 Feb 7.
8
Pangenomics enables genotyping of known structural variants in 5202 diverse genomes.泛基因组学能够对 5202 个不同基因组中的已知结构变异进行基因分型。
Science. 2021 Dec 17;374(6574):abg8871. doi: 10.1126/science.abg8871.
9
Data structures based on -mers for querying large collections of sequencing data sets.基于 - 元的序列数据集查询的大型数据集的数据结构。
Genome Res. 2021 Jan;31(1):1-12. doi: 10.1101/gr.260604.119. Epub 2020 Dec 16.
10
GraphAligner: rapid and versatile sequence-to-graph alignment.GraphAligner:快速且通用的序列到图的对齐方法。
Genome Biol. 2020 Sep 24;21(1):253. doi: 10.1186/s13059-020-02157-2.