Department of Fundamental Microbiology, UNIL, Lausanne, Switzerland.
Department of Computational Biology, UNIL, Lausanne, Switzerland.
Genome Biol. 2024 Oct 14;25(1):270. doi: 10.1186/s13059-024-03414-4.
The exponential increase in sequencing data calls for conceptual and computational advances to extract useful biological insights. One such advance, minimizers, allows for reducing the quantity of data handled while maintaining some of its key properties. We provide a basic introduction to minimizers, cover recent methodological developments, and review the diverse applications of minimizers to analyze genomic data, including de novo genome assembly, metagenomics, read alignment, read correction, and pangenomes. We also touch on alternative data sketching techniques including universal hitting sets, syncmers, or strobemers. Minimizers and their alternatives have rapidly become indispensable tools for handling vast amounts of data.
测序数据的指数级增长要求在提取有用的生物学见解方面取得概念和计算上的进展。其中一种进展是 minimizers,它可以在保持数据关键属性的同时减少处理的数据量。我们提供了 minimizers 的基本介绍,涵盖了最近的方法学发展,并回顾了 minimizers 在分析基因组数据中的多种应用,包括从头基因组组装、宏基因组学、读对齐、读校正和泛基因组。我们还涉及了替代数据草图技术,包括通用命中集、syncmers 或 strobe rs。Minimizers 及其替代品已迅速成为处理大量数据不可或缺的工具。