NICTA Victoria Research Laboratory, Department of Computing and Information Systems, The University of Melbourne, Parkville, Victoria 3010, Australia.
Bioinformatics. 2012 Jul 15;28(14):1937-8. doi: 10.1093/bioinformatics/bts297. Epub 2012 May 18.
The de novo assembly of short read high-throughput sequencing data poses significant computational challenges. The volume of data is huge; the reads are tiny compared to the underlying sequence, and there are significant numbers of sequencing errors. There are numerous software packages that allow users to assemble short reads, but most are either limited to relatively small genomes (e.g. bacteria) or require large computing infrastructure or employ greedy algorithms and thus often do not yield high-quality results.
We have developed Gossamer, an implementation of the de Bruijn approach to assembly that requires close to the theoretical minimum of memory, but still allows efficient processing. Our results show that it is space efficient and produces high-quality assemblies.
Gossamer is available for non-commercial use from http://www.genomics.csse.unimelb.edu.au/product-gossamer.php.
从头组装短读高通量测序数据带来了巨大的计算挑战。数据量巨大;与潜在序列相比,读取非常小,并且存在大量测序错误。有许多软件包允许用户组装短读,但大多数软件包要么仅限于相对较小的基因组(例如细菌),要么需要大型计算基础设施,要么采用贪婪算法,因此通常无法产生高质量的结果。
我们开发了 Gossamer,这是一种实现 de Bruijn 组装方法的软件,它需要接近理论上最小的内存,但仍允许高效处理。我们的结果表明,它具有空间效率并且产生高质量的组装。
Gossamer 可从 http://www.genomics.csse.unimelb.edu.au/product-gossamer.php 非商业使用。