USDA Dairy Forage Research Center, Madison, WI, USA.
Department of Computer Science and Engineering, University of California - San Diego, La Jolla, CA, USA.
Nat Biotechnol. 2022 May;40(5):711-719. doi: 10.1038/s41587-021-01130-z. Epub 2022 Jan 3.
Microbial communities might include distinct lineages of closely related organisms that complicate metagenomic assembly and prevent the generation of complete metagenome-assembled genomes (MAGs). Here we show that deep sequencing using long (HiFi) reads combined with Hi-C binning can address this challenge even for complex microbial communities. Using existing methods, we sequenced the sheep fecal metagenome and identified 428 MAGs with more than 90% completeness, including 44 MAGs in single circular contigs. To resolve closely related strains (lineages), we developed MAGPhase, which separates lineages of related organisms by discriminating variant haplotypes across hundreds of kilobases of genomic sequence. MAGPhase identified 220 lineage-resolved MAGs in our dataset. The ability to resolve closely related microbes in complex microbial communities improves the identification of biosynthetic gene clusters and the precision of assigning mobile genetic elements to host genomes. We identified 1,400 complete and 350 partial biosynthetic gene clusters, most of which are novel, as well as 424 (298) potential host-viral (host-plasmid) associations using Hi-C data.
微生物群落可能包含密切相关的不同谱系,这使得宏基因组组装复杂化,并阻止了完整宏基因组组装基因组 (MAG) 的生成。在这里,我们展示了使用长 (HiFi) reads 结合 Hi-C 分箱技术进行深度测序可以解决这一挑战,即使是对于复杂的微生物群落也是如此。我们使用现有的方法对绵羊粪便宏基因组进行测序,鉴定出了 428 个具有超过 90%完整性的 MAG,其中 44 个 MAG 是单环状连续序列。为了解决密切相关的菌株(谱系)问题,我们开发了 MAGPhase,它通过在数百千碱基的基因组序列中区分变异单倍型来分离相关生物的谱系。MAGPhase 在我们的数据集中共鉴定出 220 个谱系分辨的 MAG。在复杂微生物群落中能够分辨出密切相关的微生物,可以提高生物合成基因簇的识别能力,并提高将移动遗传元件精确定位到宿主基因组的能力。我们使用 Hi-C 数据鉴定了 1400 个完整和 350 个部分生物合成基因簇,其中大多数是新的,还鉴定了 424 个(298 个)潜在的宿主-病毒(宿主-质粒)关联。