Pevzner P A, Tang H, Waterman M S
Department of Computer Science and Engineering, University of California, San Diego, La Jolla, USA.
Proc Natl Acad Sci U S A. 2001 Aug 14;98(17):9748-53. doi: 10.1073/pnas.171285098.
For the last 20 years, fragment assembly in DNA sequencing followed the "overlap-layout-consensus" paradigm that is used in all currently available assembly tools. Although this approach proved useful in assembling clones, it faces difficulties in genomic shotgun assembly. We abandon the classical "overlap-layout-consensus" approach in favor of a new euler algorithm that, for the first time, resolves the 20-year-old "repeat problem" in fragment assembly. Our main result is the reduction of the fragment assembly to a variation of the classical Eulerian path problem that allows one to generate accurate solutions of large-scale sequencing problems. euler, in contrast to the celera assembler, does not mask such repeats but uses them instead as a powerful fragment assembly tool.
在过去的20年里,DNA测序中的片段组装遵循“重叠-布局-共识”范式,所有现有的组装工具都采用这种范式。尽管这种方法在克隆组装中被证明是有用的,但在基因组鸟枪法组装中却面临困难。我们摒弃了传统的“重叠-布局-共识”方法,转而采用一种新的欧拉算法,该算法首次解决了片段组装中存在20年之久的“重复问题”。我们的主要成果是将片段组装简化为经典欧拉路径问题的一种变体,从而能够生成大规模测序问题的精确解决方案。与Celera组装器不同,欧拉算法并不掩盖这些重复序列,而是将它们用作强大的片段组装工具。