School of Computer and Information Technology, Beijing Jiaotong University, Beijing 100044, China.
Biochem Biophys Res Commun. 2012 Sep 28;426(3):395-8. doi: 10.1016/j.bbrc.2012.08.101. Epub 2012 Aug 29.
Fragment assembly is one of the most important problems of sequence assembly. Algorithms for DNA fragment assembly using de Bruijn graph have been widely used. These algorithms require a large amount of memory and running time to build the de Bruijn graph. Another drawback of the conventional de Bruijn approach is the loss of information. To overcome these shortcomings, this paper proposes a parallel strategy to construct de Bruijin graph. Its main characteristic is to avoid the division of de Bruijin graph. A novel fragment assembly algorithm based on our parallel strategy is implemented in the MapReduce framework. The experimental results show that the parallel strategy can effectively improve the computational efficiency and remove the memory limitations of the assembly algorithm based on Euler superpath. This paper provides a useful attempt to the assembly of large-scale genome sequence using Cloud Computing.
片段组装是序列组装中最重要的问题之一。使用 de Bruijn 图进行 DNA 片段组装的算法已经得到了广泛的应用。这些算法在构建 de Bruijn 图时需要大量的内存和运行时间。传统 de Bruijn 方法的另一个缺点是信息丢失。为了克服这些缺点,本文提出了一种构建 de Bruijn 图的并行策略。它的主要特点是避免 de Bruijn 图的划分。基于我们的并行策略实现了一种新的基于欧拉超路径的片段组装算法。实验结果表明,该并行策略可以有效地提高计算效率,并消除基于 Euler 超路径的组装算法的内存限制。本文为使用云计算进行大规模基因组序列组装提供了有益的尝试。