Luo Junwei, Wang Jianxin, Li Weilong, Zhang Zhen, Wu Fang-Xiang, Li Min, Pan Yi
School of Information Science and Engineering, Central South University, ChangSha, 410083, China, College of Computer Science and Technology, Henan Polytechnic University, JiaoZuo, 454000, China.
School of Information Science and Engineering, Central South University, ChangSha, 410083, China.
Bioinformatics. 2015 Dec 15;31(24):3988-90. doi: 10.1093/bioinformatics/btv487. Epub 2015 Aug 26.
In genome assembly, as coverage of sequencing and genome size growing, most current softwares require a large memory for handling a great deal of sequence data. However, most researchers usually cannot meet the requirements of computing resources which prevent most current softwares from practical applications.
In this article, we present an update algorithm called EPGA2, which applies some new modules and can bring about improved assembly results in small memory. For reducing peak memory in genome assembly, EPGA2 adopts memory-efficient DSK to count K-mers and revised BCALM to construct De Bruijn Graph. Moreover, EPGA2 parallels the step of Contigs Merging and adds Errors Correction in its pipeline. Our experiments demonstrate that all these changes in EPGA2 are more useful for genome assembly.
EPGA2 is publicly available for download at https://github.com/bioinfomaticsCSU/EPGA2.
在基因组组装中,随着测序覆盖度和基因组大小的增加,当前大多数软件需要大量内存来处理大量序列数据。然而,大多数研究人员通常无法满足计算资源的要求,这阻碍了当前大多数软件的实际应用。
在本文中,我们提出了一种名为EPGA2的更新算法,该算法应用了一些新模块,并且在小内存情况下能够带来更好的组装结果。为了减少基因组组装中的峰值内存,EPGA2采用内存高效的DSK来计数K-mer,并采用修订后的BCALM来构建德布鲁因图。此外,EPGA2将重叠群合并步骤并行化,并在其流程中添加了错误校正。我们的实验表明,EPGA2中的所有这些更改对基因组组装更有用。