Bioinformatics and Systems Biology Program, ITMO University, St. Petersburg, Russia.
Cancer Data Science Laboratory, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA.
Nat Methods. 2024 Nov;21(11):2034-2043. doi: 10.1038/s41592-024-02424-1. Epub 2024 Sep 26.
Bacterial species in microbial communities are often represented by mixtures of strains, distinguished by small variations in their genomes. Short-read approaches can be used to detect small-scale variation between strains but fail to phase these variants into contiguous haplotypes. Long-read metagenome assemblers can generate contiguous bacterial chromosomes but often suppress strain-level variation in favor of species-level consensus. Here we present Strainy, an algorithm for strain-level metagenome assembly and phasing from Nanopore and PacBio reads. Strainy takes a de novo metagenomic assembly as input and identifies strain variants, which are then phased and assembled into contiguous haplotypes. Using simulated and mock Nanopore and PacBio metagenome data, we show that Strainy assembles accurate and complete strain haplotypes, outperforming current Nanopore-based methods and comparable with PacBio-based algorithms in completeness and accuracy. We then use Strainy to assemble strain haplotypes of a complex environmental metagenome, revealing distinct strain distribution and mutational patterns in bacterial species.
微生物群落中的细菌物种通常由菌株混合物表示,其基因组存在微小差异。短读长方法可用于检测菌株间的小规模变异,但无法将这些变体组合成连续的单倍型。长读长宏基因组组装器可以生成连续的细菌染色体,但通常会抑制菌株水平的变异,以支持物种水平的共识。在这里,我们提出了 Strainy,这是一种用于从纳米孔和 PacBio 读取物中进行菌株水平宏基因组组装和定相的算法。Strainy 以从头宏基因组组装作为输入,并识别菌株变体,然后将其定相并组装成连续的单倍型。使用模拟和模拟的纳米孔和 PacBio 宏基因组数据,我们表明 Strainy 组装了准确和完整的菌株单倍型,在完整性和准确性方面优于当前基于纳米孔的方法,与基于 PacBio 的算法相当。然后,我们使用 Strainy 组装复杂环境宏基因组的菌株单倍型,揭示了细菌物种中独特的菌株分布和突变模式。