Broad Institute of MIT and Harvard, Charles Street, Cambridge, MA 02141, USA.
Genome Biol. 2009;10(10):R103. doi: 10.1186/gb-2009-10-10-r103. Epub 2009 Oct 1.
We demonstrate that genome sequences approaching finished quality can be generated from short paired reads. Using 36 base (fragment) and 26 base (jumping) reads from five microbial genomes of varied GC composition and sizes up to 40 Mb, ALLPATHS2 generated assemblies with long, accurate contigs and scaffolds. Velvet and EULER-SR were less accurate. For example, for Escherichia coli, the fraction of 10-kb stretches that were perfect was 99.8% (ALLPATHS2), 68.7% (Velvet), and 42.1% (EULER-SR).
我们证明,接近完成质量的基因组序列可以从短的配对读取中生成。使用来自五个具有不同 GC 组成和大小的微生物基因组的 36 个碱基(片段)和 26 个碱基(跳跃)读取,ALLPATHS2 生成了具有长而准确的连续和支架的组装体。Velvet 和 EULER-SR 的准确性较低。例如,对于大肠杆菌,完美的 10kb 片段的比例为 99.8%(ALLPATHS2),68.7%(Velvet)和 42.1%(EULER-SR)。