Department of Mediamatics, Delft University of Technology, Delft, The Netherlands.
Bioinformatics. 2010 Sep 15;26(18):i433-9. doi: 10.1093/bioinformatics/btq366.
De novo assembly of a eukaryotic genome with next-generation sequencing data is still a challenging task. Over the past few years several assemblers have been developed, often suitable for one specific type of sequencing data. The number of known genomes is expanding rapidly, therefore it becomes possible to use multiple reference genomes for assembly projects. We introduce an assembly integrator that makes use of all available data, i.e. multiple de novo assemblies and mappings against multiple related genomes, by optimizing a weighted combination of criteria.
The developed algorithm was applied on the de novo sequencing of the Saccharomyces cerevisiae CEN.PK 113-7D strain. Using Solexa and 454 read data, two de novo and three comparative assemblies were constructed and subsequently integrated, yielding 29 contigs, covering more than 12 Mbp; a drastic improvement compared with the single assemblies.
MAIA is available as a Matlab package and can be downloaded from http://bioinformatics.tudelft.nl.
利用下一代测序数据从头组装真核生物基因组仍然是一项具有挑战性的任务。在过去的几年中,已经开发了几种组装程序,通常适用于一种特定类型的测序数据。已知基因组的数量正在迅速增加,因此可以将多个参考基因组用于组装项目。我们引入了一种组装集成器,通过优化加权组合标准,利用所有可用的数据,即多个从头组装和多个相关基因组的映射。
所开发的算法应用于酿酒酵母 CEN.PK 113-7D 菌株的从头测序。使用 Solexa 和 454 读数据,构建了两个从头组装和三个比较组装,随后进行了集成,生成了 29 个覆盖超过 12 Mbp 的 contigs;与单个组装相比有了显著的改进。
MAIA 作为一个 Matlab 程序包提供,可以从 http://bioinformatics.tudelft.nl 下载。