Instituto de Ciências Biológicas, Universidade Federal do Pará, Belém-PA, Brazil.
J Microbiol Methods. 2011 Aug;86(2):218-23. doi: 10.1016/j.mimet.2011.05.008. Epub 2011 May 18.
Due to the advent of the so-called Next-Generation Sequencing (NGS) technologies the amount of monetary and temporal resources for whole-genome sequencing has been reduced by several orders of magnitude. Sequence reads can be assembled either by anchoring them directly onto an available reference genome (classical reference assembly), or can be concatenated by overlap (de novo assembly). The latter strategy is preferable because it tends to maintain the architecture of the genome sequence the however, depending on the NGS platform used, the shortness of read lengths cause tremendous problems the in the subsequent genome assembly phase, impeding closing of the entire genome sequence. To address the problem, we developed a multi-pronged hybrid de novo strategy combining De Bruijn graph and Overlap-Layout-Consensus methods, which was used to assemble from short reads the entire genome of Corynebacterium pseudotuberculosis strain I19, a bacterium with immense importance in veterinary medicine that causes Caseous Lymphadenitis in ruminants, principally ovines and caprines. Briefly, contigs were assembled de novo from the short reads and were only oriented using a reference genome by anchoring. Remaining gaps were closed using iterative anchoring of short reads by craning to gap flanks. Finally, we compare the genome sequence assembled using our hybrid strategy to a classical reference assembly using the same data as input and show that with the availability of a reference genome, it pays off to use the hybrid de novo strategy, rather than a classical reference assembly, because more genome sequences are preserved using the former.
由于所谓的下一代测序 (NGS) 技术的出现,全基因组测序的时间和金钱资源已经减少了几个数量级。序列读取可以通过直接将它们锚定到可用的参考基因组上(经典参考组装),或者通过重叠(从头组装)将它们连接在一起。后一种策略更可取,因为它倾向于保持基因组序列的结构,但是,取决于使用的 NGS 平台,读取长度的短暂性会在随后的基因组组装阶段造成巨大的问题,阻碍整个基因组序列的闭合。为了解决这个问题,我们开发了一种多管齐下的混合从头开始策略,结合了 De Bruijn 图和重叠布局共识方法,该策略用于从短读取组装 Corynebacterium pseudotuberculosis 菌株 I19 的整个基因组,该细菌在兽医医学中具有重要意义,引起反刍动物的干酪样淋巴结炎,主要是绵羊和山羊。简而言之,从短读取中从头开始组装 contigs,仅通过锚定使用参考基因组进行定向。使用短读取的迭代锚定来封闭剩余的间隙,通过向间隙侧翼伸展。最后,我们将使用我们的混合策略组装的基因组序列与使用相同数据作为输入的经典参考组装进行比较,并表明在有参考基因组的情况下,使用混合从头开始策略而不是经典参考组装是值得的,因为前者使用的基因组序列更多。