Wang Chunyu, Guo Maozu, Liu Xiaoyan, Liu Yang, Zou Quan
BMC Med Genomics. 2015;8 Suppl 2(Suppl 2):S13. doi: 10.1186/1755-8794-8-S2-S13. Epub 2015 May 29.
DNA sequencing technology has been rapidly evolving, and produces a large number of short reads with a fast rising tendency. This has led to a resurgence of research in whole genome shotgun assembly algorithms. We start the assembly algorithm by clustering the short reads in a cloud computing framework, and the clustering process groups fragments according to their original consensus long-sequence similarity. We condense each group of reads to a chain of seeds, which is a kind of substring with reads aligned, and then build a graph accordingly. Finally, we analyze the graph to find Euler paths, and assemble the reads related in the paths into contigs, and then lay out contigs with mate-pair information for scaffolds. The result shows that our algorithm is efficient and feasible for a large set of reads such as in next-generation sequencing technology.
DNA测序技术一直在迅速发展,并产生了大量呈快速增长趋势的短读段。这导致了对全基因组鸟枪法组装算法研究的复兴。我们通过在云计算框架中对短读段进行聚类来启动组装算法,聚类过程根据它们原始的共有长序列相似性对片段进行分组。我们将每组读段压缩成种子链,种子链是一种读段对齐的子串,然后据此构建一个图。最后,我们分析该图以找到欧拉路径,并将路径中相关的读段组装成重叠群,然后利用配对信息对重叠群进行布局以构建支架。结果表明,我们的算法对于诸如下一代测序技术中的大量读段来说是高效且可行的。