Jeck William R, Reinhardt Josephine A, Baltrus David A, Hickenbotham Matthew T, Magrini Vincent, Mardis Elaine R, Dangl Jeffery L, Jones Corbin D
Department of Biology, University of Carolina-Chapel Hill, Chapel Hill, NC 27599, USA.
Bioinformatics. 2007 Nov 1;23(21):2942-4. doi: 10.1093/bioinformatics/btm451. Epub 2007 Sep 24.
Inexpensive de novo genome sequencing, particularly in organisms with small genomes, is now possible using several new sequencing technologies. Some of these technologies such as that from Illumina's Solexa Sequencing, produce high genomic coverage by generating a very large number of small reads ( approximately 30 bp). While prior work shows that partial assembly can be performed by k-mer extension in error-free reads, this algorithm is unsuccessful with the sequencing error rates found in practice. We present VCAKE (Verified Consensus Assembly by K-mer Extension), a modification of simple k-mer extension that overcomes error by using high depth coverage. Though it is a simple modification of a previous approach, we show significant improvements in assembly results on simulated and experimental datasets that include error.
利用几种新的测序技术,现在可以进行低成本的从头基因组测序,尤其是对于基因组较小的生物。其中一些技术,如Illumina公司的Solexa测序技术,通过生成大量短读段(约30个碱基对)实现高基因组覆盖度。虽然先前的工作表明在无错误读段中可通过k-mer扩展进行部分组装,但该算法在实际测序错误率情况下并不成功。我们提出了VCAKE(通过k-mer扩展进行验证的一致性组装),它是对简单k-mer扩展的一种改进,通过使用高深度覆盖来克服错误。尽管它是对先前方法的简单修改,但我们在包含错误的模拟和实验数据集上的组装结果有显著改进。