Southern Crop Protection and Food Research Centre, Agriculture and Agri-Food Canada, Government of Canada London, ON, Canada ; Department of Microbiology and Immunology, Schulich School of Medicine and Dentistry, University of Western Ontario London, ON, Canada.
Front Microbiol. 2015 Jan 21;5:769. doi: 10.3389/fmicb.2014.00769. eCollection 2014.
Advances in sequencing technology have drastically increased the depth and feasibility of bacterial genome sequencing. However, little information is available that details the specific techniques and procedures employed during genome sequencing despite the large numbers of published genomes. Shotgun approaches employed by second-generation sequencing platforms has necessitated the development of robust bioinformatics tools for in silico assembly, and complete assembly is limited by the presence of repetitive DNA sequences and multi-copy operons. Typically, re-sequencing with multiple platforms and laborious, targeted Sanger sequencing are employed to finish a draft bacterial genome. Here we describe a novel strategy based on the identification and targeted sequencing of repetitive rDNA operons to expedite bacterial genome assembly and finishing. Our strategy was validated by finishing the genome of Paenibacillus polymyxa strain CR1, a bacterium with potential in sustainable agriculture and bio-based processes. An analysis of the 38 contigs contained in the P. polymyxa strain CR1 draft genome revealed 12 repetitive rDNA operons with varied intragenic and flanking regions of variable length, unanimously located at contig boundaries and within contig gaps. These highly similar but not identical rDNA operons were experimentally verified and sequenced simultaneously with multiple, specially designed primer sets. This approach also identified and corrected significant sequence rearrangement generated during the initial in silico assembly of sequencing reads. Our approach reduces the required effort associated with blind primer walking for contig assembly, increasing both the speed and feasibility of genome finishing. Our study further reinforces the notion that repetitive DNA elements are major limiting factors for genome finishing. Moreover, we provided a step-by-step workflow for genome finishing, which may guide future bacterial genome finishing projects.
测序技术的进步极大地提高了细菌基因组测序的深度和可行性。然而,尽管已经发表了大量的基因组,但关于在基因组测序过程中使用的具体技术和程序的信息却很少。第二代测序平台采用的鸟枪法使得需要开发强大的生物信息学工具来进行计算机组装,而完整的组装受到重复 DNA 序列和多拷贝操纵子的限制。通常,使用多个平台进行重新测序和繁琐的靶向 Sanger 测序用于完成细菌基因组的草图。在这里,我们描述了一种基于鉴定和靶向重复 rDNA 操纵子的新型策略,以加速细菌基因组组装和完成。我们的策略通过完成具有可持续农业和生物基过程潜力的细菌 Paenibacillus polymyxa 菌株 CR1 的基因组得到了验证。对 P. polymyxa 菌株 CR1 草图基因组中包含的 38 个 contig 的分析揭示了 12 个具有不同基因内和侧翼区域的重复 rDNA 操纵子,它们一致位于 contig 边界和 contig 间隙内。这些高度相似但不完全相同的 rDNA 操纵子经过实验验证,并使用多个专门设计的引物组同时进行测序。这种方法还确定并纠正了在测序reads 的初始计算机组装过程中产生的显著序列重排。我们的方法减少了用于 contig 组装的盲目引物行走所需的工作量,提高了基因组完成的速度和可行性。我们的研究进一步证实了重复 DNA 元件是基因组完成的主要限制因素。此外,我们提供了一个用于基因组完成的分步工作流程,这可能指导未来的细菌基因组完成项目。