Song Weizhi, Thomas Torsten, Edwards Richard J
School of Biotechnology and Biomolecular Sciences, University of New South Wales, Sydney, NSW, Australia; Centre for Marine Bio-Innovation, University of New South Wales, Sydney, Australia.
Centre for Marine Bio-Innovation, University of New South Wales, Sydney, Australia; School of Biological, Earth and Environmental Sciences, University of New South Wales, Sydney, NSW, Australia.
Mar Genomics. 2019 Dec;48:100687. doi: 10.1016/j.margen.2019.05.002. Epub 2019 May 23.
High-quality, completed genomes are important to understand the functions of marine bacteria. PacBio sequencing technology provides a powerful way to obtain high-quality completed genomes. However individual library production is currently still costly, limiting the utility of the PacBio system for high-throughput genomics. Here we investigate how to generate high-quality genomes from pooled marine bacterial genomes.
Pooled genomic DNA from 10 marine bacteria were subjected to a single library production and sequenced with eight SMRT cells on the PacBio RS II sequencing platform. In total, 7.35 Gbp of long-read data was generated, which is equivalent to an approximate 168× average coverage for the input genomes. Genome assembly showed that eight genomes with average nucleotide identities (ANI) lower than 91.4% can be assembled with high-quality and completion using standard assembly algorithms (e.g. HGAP or Canu). A reference-based reads phasing step was developed and incorporated to assemble the complete genomes of the remaining two marine bacteria that had an ANI > 97% and whose initial assemblies were highly fragmented.
Ten complete high-quality genomes of marine bacteria were generated. The findings and developments made here, including the reference-based read phasing approach for the assembly of highly similar genomes, can be used in the future to design strategies to sequence pooled genomes using long-read sequencing.
高质量的完整基因组对于理解海洋细菌的功能至关重要。PacBio测序技术为获取高质量的完整基因组提供了有力途径。然而,目前单个文库的构建成本仍然很高,限制了PacBio系统在高通量基因组学中的应用。在此,我们研究如何从混合的海洋细菌基因组中生成高质量的基因组。
将来自10种海洋细菌的混合基因组DNA进行单个文库构建,并在PacBio RS II测序平台上用8个SMRT细胞进行测序。总共产生了7.35 Gbp的长读长数据,相当于输入基因组平均约168倍的覆盖度。基因组组装显示,使用标准组装算法(如HGAP或Canu),可以高质量且完整地组装出8个平均核苷酸同一性(ANI)低于91.4%的基因组。开发并纳入了基于参考的读段定相步骤,以组装其余两个ANI>97%且初始组装高度碎片化的海洋细菌的完整基因组。
生成了10个海洋细菌的完整高质量基因组。此处的研究结果和进展,包括用于组装高度相似基因组的基于参考的读段定相方法,未来可用于设计使用长读长测序对混合基因组进行测序的策略。