Chan Zuckerberg Biohub, San Francisco, CA, USA.
Gladstone Institutes of Data Science and Biotechnology, San Francisco, CA, USA.
Genome Biol. 2023 Aug 10;24(1):186. doi: 10.1186/s13059-023-03030-8.
Existing single nucleotide polymorphism (SNP) genotyping algorithms do not scale for species with thousands of sequenced strains, nor do they account for conspecific redundancy. Here we present a bioinformatics tool, Maast, which empowers population genetic meta-analysis of microbes at an unrivaled scale. Maast implements a novel algorithm to heuristically identify a minimal set of diverse conspecific genomes, then constructs a reliable SNP panel for each species, and enables rapid and accurate genotyping using a hybrid of whole-genome alignment and k-mer exact matching. We demonstrate Maast's utility by genotyping thousands of Helicobacter pylori strains and tracking SARS-CoV-2 diversification.
现有的单核苷酸多态性(SNP)基因分型算法不适用于具有数千个测序菌株的物种,也无法考虑同物冗余。在这里,我们介绍了一种生物信息学工具 Maast,它可以在无与伦比的规模上实现微生物群体遗传元分析。Maast 实现了一种新颖的算法,可以启发式地识别一组最小的多样化同物基因组,然后为每个物种构建一个可靠的 SNP 面板,并通过全基因组比对和 k-mer 精确匹配的混合使用实现快速准确的基因分型。我们通过对数千株幽门螺杆菌菌株进行基因分型并跟踪 SARS-CoV-2 的多样化来证明 Maast 的实用性。