Bioinformatics and Systems Biology Graduate Program, University of California San Diego, La Jolla, California, USA.
PLoS One. 2010 Jun 8;5(6):e10922. doi: 10.1371/journal.pone.0010922.
State-of-the-art DNA sequencing technologies are transforming the life sciences due to their ability to generate nucleotide sequence information with a speed and quantity that is unapproachable with traditional Sanger sequencing. Genome sequencing is a principal application of this technology, where the ultimate goal is the full and complete sequence of the organism of interest. Due to the nature of the raw data produced by these technologies, a full genomic sequence attained without the aid of Sanger sequencing has yet to be demonstrated.We have successfully developed a four-phase strategy for using only next-generation sequencing technologies (Illumina and 454) to assemble a complete microbial genome de novo. We applied this approach to completely assemble the 3.7 Mb genome of a rare Geobacter variant (KN400) that is capable of unprecedented current production at an electrode. Two key components of our strategy enabled us to achieve this result. First, we integrated the two data types early in the process to maximally leverage their complementary characteristics. And second, we used the output of different short read assembly programs in such a way so as to leverage the complementary nature of their different underlying algorithms or of their different implementations of the same underlying algorithm.The significance of our result is that it demonstrates a general approach for maximizing the efficiency and success of genome assembly projects as new sequencing technologies and new assembly algorithms are introduced. The general approach is a meta strategy, wherein sequencing data are integrated as early as possible and in particular ways and wherein multiple assembly algorithms are judiciously applied such that the deficiencies in one are complemented by another.
最先进的 DNA 测序技术正在改变生命科学,因为它们能够以传统桑格测序无法企及的速度和数量生成核苷酸序列信息。基因组测序是这项技术的主要应用,其最终目标是获得感兴趣的生物体的完整和完整序列。由于这些技术产生的原始数据的性质,尚未证明在没有桑格测序帮助的情况下获得完整的基因组序列。我们已经成功开发了一种仅使用下一代测序技术(Illumina 和 454)从头组装完整微生物基因组的四阶段策略。我们应用这种方法完全组装了一种罕见的 Geobacter 变体(KN400)的 3.7 Mb 基因组,该变体能够在电极上以前所未有的电流产生。我们策略的两个关键组成部分使我们能够实现这一结果。首先,我们在早期将这两种数据类型集成在一起,以最大限度地发挥它们互补的特点。其次,我们以这样的方式使用不同短读序列组装程序的输出,以利用其不同底层算法或同一底层算法的不同实现的互补性。我们的结果的意义在于,它展示了一种通用方法,用于最大限度地提高基因组组装项目的效率和成功率,因为新的测序技术和新的组装算法不断被引入。通用方法是一种元策略,其中尽早以特定方式集成测序数据,并明智地应用多种组装算法,以使一种算法的缺陷得到另一种算法的补充。