Computational & Systems Biology, Genome Institute of Singapore, Singapore, Singapore.
Faculty of Electrical Engineering and Computing, Department of Electronic Systems and Information Processing, University of Zagreb, Zagreb, Croatia.
Nat Biotechnol. 2019 Aug;37(8):937-944. doi: 10.1038/s41587-019-0191-2. Epub 2019 Jul 29.
Characterization of microbiomes has been enabled by high-throughput metagenomic sequencing. However, existing methods are not designed to combine reads from short- and long-read technologies. We present a hybrid metagenomic assembler named OPERA-MS that integrates assembly-based metagenome clustering with repeat-aware, exact scaffolding to accurately assemble complex communities. Evaluation using defined in vitro and virtual gut microbiomes revealed that OPERA-MS assembles metagenomes with greater base pair accuracy than long-read (>5×; Canu), higher contiguity than short-read (~10× NGA50; MEGAHIT, IDBA-UD, metaSPAdes) and fewer assembly errors than non-metagenomic hybrid assemblers (2×; hybridSPAdes). OPERA-MS provides strain-resolved assembly in the presence of multiple genomes of the same species, high-quality reference genomes for rare species (<1%) with ~9× long-read coverage and near-complete genomes with higher coverage. We used OPERA-MS to assemble 28 gut metagenomes of antibiotic-treated patients, and showed that the inclusion of long nanopore reads produces more contiguous assemblies (200× improvement over short-read assemblies), including more than 80 closed plasmid or phage sequences and a new 263 kbp jumbo phage. High-quality hybrid assemblies enable an exquisitely detailed view of the gut resistome in human patients.
高通量宏基因组测序使微生物组的特征得以实现。然而,现有的方法并不是为了将短读长和长读长技术的reads 结合起来而设计的。我们提出了一种名为 OPERA-MS 的混合宏基因组组装器,它将基于组装的宏基因组聚类与重复感知、精确的支架相结合,以准确组装复杂的群落。使用定义的体外和虚拟肠道微生物组进行评估表明,OPERA-MS 比长读 (>5×;Canu)组装的宏基因组具有更高的碱基对准确性,比短读 (~10× NGA50;MEGAHIT、IDBA-UD、metaSPAdes)具有更高的连续性,比非宏基因组混合组装器 (2×;hybridSPAdes)具有更少的组装错误。OPERA-MS 提供了在同一物种的多个基因组存在的情况下进行菌株解析的组装、高质量的参考基因组 (>9× 长读覆盖),用于稀有物种(<1%)和接近完整的基因组,覆盖度更高。我们使用 OPERA-MS 组装了 28 个抗生素治疗患者的肠道宏基因组,并表明包含长纳米孔读取可以产生更连续的组装(比短读组装提高 200 倍),包括超过 80 个闭合的质粒或噬菌体序列和一个新的 263kbp 巨型噬菌体。高质量的混合组装使人们能够对人类患者的肠道抗药组进行极其详细的观察。