Faculty of Technology and Center for Biotechnology, Bielefeld University, Bielefeld, Germany.
Methods Mol Biol. 2024;2802:73-106. doi: 10.1007/978-1-0716-3838-5_4.
Computational pangenomics deals with the joint analysis of all genomic sequences of a species. It has already been successfully applied to various tasks in many research areas. Further advances in DNA sequencing technologies constantly let more and more genomic sequences become available for many species, leading to an increasing attractiveness of pangenomic studies. At the same time, larger datasets also pose new challenges for data structures and algorithms that are needed to handle the data. Efficient methods oftentimes make use of the concept of k-mers.Core detection is a common way of analyzing a pangenome. The pangenome's core is defined as the subset of genomic information shared among all individual members. Classically, it is not only determined on the abstract level of genes but can also be described on the sequence level.In this chapter, we provide an overview of k-mer-based methods in the context of pangenomics studies. We first revisit existing software solutions for k-mer counting and k-mer set representation. Afterward, we describe the usage of two k-mer-based approaches, Pangrowth and Corer, for pangenomic core detection.
计算泛基因组学涉及对一个物种的所有基因组序列进行联合分析。它已经成功地应用于许多研究领域的各种任务。DNA 测序技术的进一步发展不断为许多物种提供越来越多的基因组序列,使得泛基因组研究的吸引力不断增加。与此同时,更大的数据集也为处理数据所需的数据结构和算法提出了新的挑战。高效的方法通常利用 k-mer 的概念。核心检测是分析泛基因组的常用方法。泛基因组的核心被定义为所有个体成员共享的基因组信息子集。经典地,它不仅在基因的抽象水平上确定,而且还可以在序列水平上描述。在本章中,我们将在泛基因组学研究的背景下概述基于 k-mer 的方法。我们首先重新审视现有的 k-mer 计数和 k-mer 集表示的软件解决方案。之后,我们描述了两种基于 k-mer 的方法 Pangrowth 和 Corer 在泛基因组核心检测中的使用。