Chaisson Mark J, Pevzner Pavel A
Bioinformatics Program, University of California San Diego, La Jolla, California 92093, USA.
Genome Res. 2008 Feb;18(2):324-30. doi: 10.1101/gr.7088808. Epub 2007 Dec 14.
In the last year, high-throughput sequencing technologies have progressed from proof-of-concept to production quality. While these methods produce high-quality reads, they have yet to produce reads comparable in length to Sanger-based sequencing. Current fragment assembly algorithms have been implemented and optimized for mate-paired Sanger-based reads, and thus do not perform well on short reads produced by short read technologies. We present a new Eulerian assembler that generates nearly optimal short read assemblies of bacterial genomes and describe an approach to assemble reads in the case of the popular hybrid protocol when short and long Sanger-based reads are combined.
在过去的一年里,高通量测序技术已从概念验证发展到生产质量阶段。虽然这些方法能产生高质量的读数,但它们产生的读数长度仍无法与基于桑格测序法的读数相媲美。当前的片段组装算法是针对基于配对末端桑格测序法的读数而实现和优化的,因此在短读技术产生的短读数上表现不佳。我们提出了一种新的欧拉组装器,它能生成接近最优的细菌基因组短读组装结果,并描述了一种在流行的混合协议中,当基于桑格测序法的短读数和长读数相结合时组装读数的方法。