• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于K-mer的泛基因组学与群体遗传学关联方法。

K-mer-based Approaches to Bridging Pangenomics and Population Genetics.

作者信息

Roberts Miles D, Davis Olivia, Josephs Emily B, Williamson Robert J

机构信息

Genetics and Genome Sciences Program, Michigan State University, East Lansing, MI 48824, USA.

Department of Computer Science and Software Engineering, Rose-Hulman Institute of Technology, Terre Haute, IN 47803, USA.

出版信息

Mol Biol Evol. 2025 Mar 5;42(3). doi: 10.1093/molbev/msaf047.

DOI:10.1093/molbev/msaf047
PMID:40111256
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11925024/
Abstract

Many commonly studied species now have more than one chromosome-scale genome assembly, revealing a large amount of genetic diversity previously missed by approaches that map short reads to a single reference. However, many species still lack multiple reference genomes and correctly aligning references to build pangenomes can be challenging for many species, limiting our ability to study this missing genomic variation in population genetics. Here, we argue that k-mers are a very useful but underutilized tool for bridging the reference-focused paradigms of population genetics with the reference-free paradigms of pangenomics. We review current literature on the uses of k-mers for performing three core components of most population genetics analyses: identifying, measuring, and explaining patterns of genetic variation. We also demonstrate how different k-mer-based measures of genetic variation behave in population genetic simulations according to the choice of k, depth of sequencing coverage, and degree of data compression. Overall, we find that k-mer-based measures of genetic diversity scale consistently with pairwise nucleotide diversity (π) up to values of about π=0.025 (R2=0.97) for neutrally evolving populations. For populations with even more variation, using shorter k-mers will maintain the scalability up to at least π=0.1. Furthermore, in our simulated populations, k-mer dissimilarity values can be reliably approximated from counting bloom filters, highlighting a potential avenue to decreasing the memory burden of k-mer-based genomic dissimilarity analyses. For future studies, there is a great opportunity to further develop methods to identifying selected loci using k-mers.

摘要

许多经常被研究的物种现在有不止一个染色体水平的基因组组装,揭示了大量以前通过将短读段映射到单个参考序列的方法而遗漏的遗传多样性。然而,许多物种仍然缺乏多个参考基因组,并且对许多物种来说,正确比对参考序列以构建泛基因组可能具有挑战性,这限制了我们在群体遗传学中研究这种缺失的基因组变异的能力。在这里,我们认为k-mer是一种非常有用但未得到充分利用的工具,可用于将群体遗传学以参考序列为重点的范式与泛基因组学的无参考序列范式联系起来。我们回顾了当前关于k-mer在大多数群体遗传学分析的三个核心组成部分中的应用的文献:识别、测量和解释遗传变异模式。我们还展示了根据k的选择、测序覆盖深度和数据压缩程度,不同的基于k-mer的遗传变异测量方法在群体遗传模拟中的表现。总体而言,我们发现对于中性进化的群体,基于k-mer的遗传多样性测量与成对核苷酸多样性(π)在约π = 0.025的值之前一致(R2 = 0.97)。对于变异更多的群体,使用更短的k-mer将至少保持可扩展性至π = 0.1。此外,在我们的模拟群体中,k-mer差异值可以通过计数布隆过滤器可靠地近似,这突出了一条减轻基于k-mer的基因组差异分析内存负担的潜在途径。对于未来的研究,有很大的机会进一步开发使用k-mer识别选择位点的方法。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0998/11925024/e1ed4d7e8df9/msaf047f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0998/11925024/47935360b137/msaf047f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0998/11925024/ce37798b2b0c/msaf047f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0998/11925024/6e7c971ef05f/msaf047f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0998/11925024/bc85e7b30bd2/msaf047f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0998/11925024/e1ed4d7e8df9/msaf047f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0998/11925024/47935360b137/msaf047f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0998/11925024/ce37798b2b0c/msaf047f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0998/11925024/6e7c971ef05f/msaf047f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0998/11925024/bc85e7b30bd2/msaf047f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0998/11925024/e1ed4d7e8df9/msaf047f5.jpg

相似文献

1
K-mer-based Approaches to Bridging Pangenomics and Population Genetics.基于K-mer的泛基因组学与群体遗传学关联方法。
Mol Biol Evol. 2025 Mar 5;42(3). doi: 10.1093/molbev/msaf047.
2
k-mer-based approaches to bridging pangenomics and population genetics.基于k-mer的方法在泛基因组学和群体遗传学之间架起桥梁。
ArXiv. 2024 Sep 18:arXiv:2409.11683v1.
3
Determining population structure from k-mer frequencies.从k-mer频率确定群体结构。
PeerJ. 2025 Mar 5;13:e18939. doi: 10.7717/peerj.18939. eCollection 2025.
4
Assessment of k-mer spectrum applicability for metagenomic dissimilarity analysis.用于宏基因组差异分析的k-mer谱适用性评估。
BMC Bioinformatics. 2016 Jan 16;17:38. doi: 10.1186/s12859-015-0875-7.
5
An alignment- and reference-free strategy using -mer present pattern for population genomic analyses.一种使用-mer呈现模式的无比对和无参考策略用于群体基因组分析。
Mycology. 2024 Jun 5;16(1):309-323. doi: 10.1080/21501203.2024.2358868. eCollection 2025.
6
SAKE: Strobemer-assisted k-mer extraction.SAKE:频闪辅助 k-mer 提取。
PLoS One. 2023 Nov 29;18(11):e0294415. doi: 10.1371/journal.pone.0294415. eCollection 2023.
7
Methods for Pangenomic Core Detection.泛基因组核心检测方法。
Methods Mol Biol. 2024;2802:73-106. doi: 10.1007/978-1-0716-3838-5_4.
8
KCOSS: an ultra-fast k-mer counter for assembled genome analysis.KCOSS:用于组装基因组分析的超快速k-mer计数器。
Bioinformatics. 2022 Jan 27;38(4):933-940. doi: 10.1093/bioinformatics/btab797.
9
A k-mer-based bulked segregant analysis approach to map seed traits in unphased heterozygous potato genomes.基于 k- -mer 的 bulked segregant 分析方法在未测序的杂合马铃薯基因组中定位种子性状。
G3 (Bethesda). 2024 Apr 3;14(4). doi: 10.1093/g3journal/jkae035.
10
kWIP: The k-mer weighted inner product, a de novo estimator of genetic similarity.kWIP:k-mer加权内积,一种遗传相似性的从头估计器。
PLoS Comput Biol. 2017 Sep 5;13(9):e1005727. doi: 10.1371/journal.pcbi.1005727. eCollection 2017 Sep.

引用本文的文献

1
Independent domestication and cultivation histories of two West African indigenous fonio millet crops.两种西非本土非洲黍稷作物的独立驯化与种植历史。
Nat Commun. 2025 Apr 30;16(1):4067. doi: 10.1038/s41467-025-59454-2.

本文引用的文献

1
An alignment- and reference-free strategy using -mer present pattern for population genomic analyses.一种使用-mer呈现模式的无比对和无参考策略用于群体基因组分析。
Mycology. 2024 Jun 5;16(1):309-323. doi: 10.1080/21501203.2024.2358868. eCollection 2025.
2
-mer approaches for biodiversity genomics.用于生物多样性基因组学的-mer方法。
Genome Res. 2025 Feb 14;35(2):219-230. doi: 10.1101/gr.279452.124.
3
Building pangenome graphs.构建泛基因组图谱。
Nat Methods. 2024 Nov;21(11):2008-2012. doi: 10.1038/s41592-024-02430-3. Epub 2024 Oct 21.
4
ntsm: an alignment-free, ultra-low-coverage, sequencing technology agnostic, intraspecies sample comparison tool for sample swap detection.ntsm:一种无需对齐、超低覆盖度、与测序技术无关的种内样本比较工具,用于检测样本交换。
Gigascience. 2024 Jan 2;13. doi: 10.1093/gigascience/giae024.
5
Estimates of heterozygosity from single nucleotide polymorphism markers are context-dependent and often wrong.基于单核苷酸多态性标记的杂合度估计是依赖于背景的,而且往往是错误的。
Mol Ecol Resour. 2024 May;24(4):e13947. doi: 10.1111/1755-0998.13947. Epub 2024 Mar 3.
6
Competition and evolutionary selection among core regulatory motifs in gene expression control.基因表达调控中核心调控基序的竞争与进化选择。
Nat Commun. 2023 Dec 13;14(1):8266. doi: 10.1038/s41467-023-43327-7.
7
Comparison of k-mer-based comparative metagenomic tools and approaches.基于k-mer的比较宏基因组学工具和方法的比较。
Microbiome Res Rep. 2023 Jul 20;2(4):27. doi: 10.20517/mrr.2023.26. eCollection 2023.
8
A pangenome graph reference of 30 chicken genomes allows genotyping of large and complex structural variants.30 个鸡基因组的泛基因组图谱参考可对大型和复杂结构变异进行基因分型。
BMC Biol. 2023 Nov 22;21(1):267. doi: 10.1186/s12915-023-01758-0.
9
PanKmer: k-mer-based and reference-free pangenome analysis.PanKmer:基于 k-mer 的无参考基因组泛基因组分析。
Bioinformatics. 2023 Oct 3;39(10). doi: 10.1093/bioinformatics/btad621.
10
Evolution of the Mutation Spectrum Across a Mammalian Phylogeny.哺乳动物系统发育中的突变谱演变。
Mol Biol Evol. 2023 Oct 4;40(10). doi: 10.1093/molbev/msad213.