Derakhshani Hooman, Bernier Steve P, Marko Victoria A, Surette Michael G
Farncombe Family Digestive Health Research Institute, McMaster University, Hamilton, Ontario, Canada.
Department of Medicine, McMaster University, Hamilton, Ontario, Canada.
BMC Genomics. 2020 Jul 29;21(1):519. doi: 10.1186/s12864-020-06910-6.
Illumina technology currently dominates bacterial genomics due to its high read accuracy and low sequencing cost. However, the incompleteness of draft genomes generated by Illumina reads limits their application in comprehensive genomics analyses. Alternatively, hybrid assembly using both Illumina short reads and long reads generated by single molecule sequencing technologies can enable assembly of complete bacterial genomes, yet the high per-genome cost of long-read sequencing limits the widespread use of this approach in bacterial genomics. Here we developed a protocol for hybrid assembly of complete bacterial genomes using miniaturized multiplexed Illumina sequencing and non-barcoded PacBio sequencing of a synthetic genomic pool (SGP), thus significantly decreasing the overall per-genome cost of sequencing.
We evaluated the performance of SGP hybrid assembly on the genomes of 20 bacterial isolates with different genome sizes, a wide range of GC contents, and varying levels of phylogenetic relatedness. By improving the contiguity of Illumina assemblies, SGP hybrid assembly generated 17 complete and 3 nearly complete bacterial genomes. Increased contiguity of SGP hybrid assemblies resulted in considerable improvement in gene prediction and annotation. In addition, SGP hybrid assembly was able to resolve repeat elements and identify intragenomic heterogeneities, e.g. different copies of 16S rRNA genes, that would otherwise go undetected by short-read-only assembly. Comprehensive comparison of SGP hybrid assemblies with those generated using multiplexed PacBio long reads (long-read-only assembly) also revealed the relative advantage of SGP hybrid assembly in terms of assembly quality. In particular, we observed that SGP hybrid assemblies were completely devoid of both small (i.e. single base substitutions) and large assembly errors. Finally, we show the ability of SGP hybrid assembly to differentiate genomes of closely related bacterial isolates, suggesting its potential application in comparative genomics and pangenome analysis.
Our results indicate the superiority of SGP hybrid assembly over both short-read and long-read assemblies with respect to completeness, contiguity, accuracy, and recovery of small replicons. By lowering the per-genome cost of sequencing, our parallel sequencing and hybrid assembly pipeline could serve as a cost effective and high throughput approach for completing high-quality bacterial genomes.
由于读取准确性高且测序成本低,Illumina技术目前在细菌基因组学领域占据主导地位。然而,Illumina读取产生的基因组草图的不完整性限制了它们在全面基因组分析中的应用。另外,使用Illumina短读取和单分子测序技术产生的长读取进行混合组装可以实现完整细菌基因组的组装,但是长读取测序的高基因组成本限制了这种方法在细菌基因组学中的广泛应用。在这里,我们开发了一种使用合成基因组池(SGP)的小型化多重Illumina测序和无条形码PacBio测序进行完整细菌基因组混合组装的方案,从而显著降低了测序的整体基因组成本。
我们评估了SGP混合组装在20种具有不同基因组大小、广泛的GC含量以及不同系统发育相关性水平的细菌分离株基因组上的性能。通过提高Illumina组装的连续性,SGP混合组装产生了17个完整和3个近乎完整的细菌基因组。SGP混合组装连续性的提高导致基因预测和注释有了显著改善。此外,SGP混合组装能够解析重复元件并识别基因组内的异质性,例如16S rRNA基因的不同拷贝,否则仅短读取组装将无法检测到这些。将SGP混合组装与使用多重PacBio长读取产生的组装(仅长读取组装)进行全面比较,也揭示了SGP混合组装在组装质量方面的相对优势。特别是,我们观察到SGP混合组装完全没有小的(即单碱基替换)和大的组装错误。最后,我们展示了SGP混合组装区分密切相关细菌分离株基因组的能力,表明其在比较基因组学和泛基因组分析中的潜在应用。
我们的结果表明,SGP混合组装在完整性、连续性、准确性和小复制子的恢复方面优于短读取和长读取组装。通过降低测序的基因组成本,我们的平行测序和混合组装流程可以作为一种经济高效且高通量的方法来完成高质量细菌基因组。