INRIA Rennes - Bretagne Atlantique, EPI Symbiose, Rennes, France.
BMC Bioinformatics. 2012 Mar 23;13:48. doi: 10.1186/1471-2105-13-48.
The analysis of next-generation sequencing data from large genomes is a timely research topic. Sequencers are producing billions of short sequence fragments from newly sequenced organisms. Computational methods for reconstructing whole genomes/transcriptomes (de novo assemblers) are typically employed to process such data. However, these methods require large memory resources and computation time. Many basic biological questions could be answered targeting specific information in the reads, thus avoiding complete assembly.
We present Mapsembler, an iterative micro and targeted assembler which processes large datasets of reads on commodity hardware. Mapsembler checks for the presence of given regions of interest that can be constructed from reads and builds a short assembly around it, either as a plain sequence or as a graph, showing contextual structure. We introduce new algorithms to retrieve approximate occurrences of a sequence from reads and construct an extension graph. Among other results presented in this paper, Mapsembler enabled to retrieve previously described human breast cancer candidate fusion genes, and to detect new ones not previously known.
Mapsembler is the first software that enables de novo discovery around a region of interest of repeats, SNPs, exon skipping, gene fusion, as well as other structural events, directly from raw sequencing reads. As indexing is localized, the memory footprint of Mapsembler is negligible. Mapsembler is released under the CeCILL license and can be freely downloaded from http://alcovna.genouest.org/mapsembler/.
分析来自大型基因组的下一代测序数据是一个及时的研究课题。测序仪正在从新测序的生物体中产生数十亿个短序列片段。用于处理此类数据的计算方法通常是重建全基因组/转录组(从头组装程序)。然而,这些方法需要大量的内存资源和计算时间。许多基本的生物学问题可以通过针对读取中的特定信息来回答,从而避免完整的组装。
我们提出了 Mapsembler,这是一种迭代的微观和靶向组装程序,可以在商品硬件上处理大型读取数据集。Mapsembler 检查给定感兴趣区域的存在,这些区域可以从读取中构建,并在其周围构建一个短的组装,无论是作为一个简单的序列还是作为一个显示上下文结构的图。我们引入了新的算法来从读取中检索序列的近似出现,并构建扩展图。本文介绍的其他结果中,Mapsembler 能够检索以前描述的人类乳腺癌候选融合基因,并检测到以前未知的新基因。
Mapsembler 是第一个能够直接从原始测序读取中围绕重复、SNP、外显子跳跃、基因融合以及其他结构事件的感兴趣区域进行从头发现的软件。由于索引是本地化的,Mapsembler 的内存占用可以忽略不计。Mapsembler 是根据 CeCILL 许可证发布的,可以从 http://alcovna.genouest.org/mapsembler/ 免费下载。