Suppr超能文献

基于K-mer的泛基因组学与群体遗传学关联方法。

K-mer-based Approaches to Bridging Pangenomics and Population Genetics.

作者信息

Roberts Miles D, Davis Olivia, Josephs Emily B, Williamson Robert J

机构信息

Genetics and Genome Sciences Program, Michigan State University, East Lansing, MI 48824, USA.

Department of Computer Science and Software Engineering, Rose-Hulman Institute of Technology, Terre Haute, IN 47803, USA.

出版信息

Mol Biol Evol. 2025 Mar 5;42(3). doi: 10.1093/molbev/msaf047.

Abstract

Many commonly studied species now have more than one chromosome-scale genome assembly, revealing a large amount of genetic diversity previously missed by approaches that map short reads to a single reference. However, many species still lack multiple reference genomes and correctly aligning references to build pangenomes can be challenging for many species, limiting our ability to study this missing genomic variation in population genetics. Here, we argue that k-mers are a very useful but underutilized tool for bridging the reference-focused paradigms of population genetics with the reference-free paradigms of pangenomics. We review current literature on the uses of k-mers for performing three core components of most population genetics analyses: identifying, measuring, and explaining patterns of genetic variation. We also demonstrate how different k-mer-based measures of genetic variation behave in population genetic simulations according to the choice of k, depth of sequencing coverage, and degree of data compression. Overall, we find that k-mer-based measures of genetic diversity scale consistently with pairwise nucleotide diversity (π) up to values of about π=0.025 (R2=0.97) for neutrally evolving populations. For populations with even more variation, using shorter k-mers will maintain the scalability up to at least π=0.1. Furthermore, in our simulated populations, k-mer dissimilarity values can be reliably approximated from counting bloom filters, highlighting a potential avenue to decreasing the memory burden of k-mer-based genomic dissimilarity analyses. For future studies, there is a great opportunity to further develop methods to identifying selected loci using k-mers.

摘要

许多经常被研究的物种现在有不止一个染色体水平的基因组组装,揭示了大量以前通过将短读段映射到单个参考序列的方法而遗漏的遗传多样性。然而,许多物种仍然缺乏多个参考基因组,并且对许多物种来说,正确比对参考序列以构建泛基因组可能具有挑战性,这限制了我们在群体遗传学中研究这种缺失的基因组变异的能力。在这里,我们认为k-mer是一种非常有用但未得到充分利用的工具,可用于将群体遗传学以参考序列为重点的范式与泛基因组学的无参考序列范式联系起来。我们回顾了当前关于k-mer在大多数群体遗传学分析的三个核心组成部分中的应用的文献:识别、测量和解释遗传变异模式。我们还展示了根据k的选择、测序覆盖深度和数据压缩程度,不同的基于k-mer的遗传变异测量方法在群体遗传模拟中的表现。总体而言,我们发现对于中性进化的群体,基于k-mer的遗传多样性测量与成对核苷酸多样性(π)在约π = 0.025的值之前一致(R2 = 0.97)。对于变异更多的群体,使用更短的k-mer将至少保持可扩展性至π = 0.1。此外,在我们的模拟群体中,k-mer差异值可以通过计数布隆过滤器可靠地近似,这突出了一条减轻基于k-mer的基因组差异分析内存负担的潜在途径。对于未来的研究,有很大的机会进一步开发使用k-mer识别选择位点的方法。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0998/11925024/47935360b137/msaf047f1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验