EPGA2：内存高效的从头组装器。

EPGA2: memory-efficient de novo assembler.

作者信息

Luo Junwei, Wang Jianxin, Li Weilong, Zhang Zhen, Wu Fang-Xiang, Li Min, Pan Yi

机构信息

School of Information Science and Engineering, Central South University, ChangSha, 410083, China, College of Computer Science and Technology, Henan Polytechnic University, JiaoZuo, 454000, China.

School of Information Science and Engineering, Central South University, ChangSha, 410083, China.

出版信息

Bioinformatics. 2015 Dec 15;31(24):3988-90. doi: 10.1093/bioinformatics/btv487. Epub 2015 Aug 26.

DOI:10.1093/bioinformatics/btv487

PMID:26315905

Abstract

MOTIVATION

In genome assembly, as coverage of sequencing and genome size growing, most current softwares require a large memory for handling a great deal of sequence data. However, most researchers usually cannot meet the requirements of computing resources which prevent most current softwares from practical applications.

RESULTS

In this article, we present an update algorithm called EPGA2, which applies some new modules and can bring about improved assembly results in small memory. For reducing peak memory in genome assembly, EPGA2 adopts memory-efficient DSK to count K-mers and revised BCALM to construct De Bruijn Graph. Moreover, EPGA2 parallels the step of Contigs Merging and adds Errors Correction in its pipeline. Our experiments demonstrate that all these changes in EPGA2 are more useful for genome assembly.

AVAILABILITY AND IMPLEMENTATION

EPGA2 is publicly available for download at https://github.com/bioinfomaticsCSU/EPGA2.

摘要

动机

在基因组组装中，随着测序覆盖度和基因组大小的增加，当前大多数软件需要大量内存来处理大量序列数据。然而，大多数研究人员通常无法满足计算资源的要求，这阻碍了当前大多数软件的实际应用。

结果

在本文中，我们提出了一种名为EPGA2的更新算法，该算法应用了一些新模块，并且在小内存情况下能够带来更好的组装结果。为了减少基因组组装中的峰值内存，EPGA2采用内存高效的DSK来计数K-mer，并采用修订后的BCALM来构建德布鲁因图。此外，EPGA2将重叠群合并步骤并行化，并在其流程中添加了错误校正。我们的实验表明，EPGA2中的所有这些更改对基因组组装更有用。

可用性和实现

EPGA2可在https://github.com/bioinfomaticsCSU/EPGA2上公开下载。

相似文献

EPGA2: memory-efficient de novo assembler.EPGA2：内存高效的从头组装器。

Bioinformatics. 2015 Dec 15;31(24):3988-90. doi: 10.1093/bioinformatics/btv487. Epub 2015 Aug 26.

Faucet: streaming de novo assembly graph construction.Faucet：从头开始的流装配图构建。

Bioinformatics. 2018 Jan 1;34(1):147-154. doi: 10.1093/bioinformatics/btx471.

MegaGTA: a sensitive and accurate metagenomic gene-targeted assembler using iterative de Bruijn graphs.MegaGTA：一种使用迭代德布鲁因图的灵敏且准确的宏基因组基因靶向组装器。

BMC Bioinformatics. 2017 Oct 16;18(Suppl 12):408. doi: 10.1186/s12859-017-1825-3.

Compacting de Bruijn graphs from sequencing data quickly and in low memory.从测序数据中快速且低内存地压缩德布鲁因图。

Bioinformatics. 2016 Jun 15;32(12):i201-i208. doi: 10.1093/bioinformatics/btw279.

Efficient de novo assembly of large genomes using compressed data structures.利用压缩数据结构进行高效的从头基因组组装。

Genome Res. 2012 Mar;22(3):549-56. doi: 10.1101/gr.126953.111. Epub 2011 Dec 7.

Scalable De Novo Genome Assembly Using a Pregel-Like Graph-Parallel System.基于图并行的 Pre-gel 样系统的可扩展从头基因组组装。

IEEE/ACM Trans Comput Biol Bioinform. 2021 Mar-Apr;18(2):731-744. doi: 10.1109/TCBB.2019.2920912. Epub 2021 Apr 6.

LightAssembler: fast and memory-efficient assembly algorithm for high-throughput sequencing reads.LightAssembler：一种用于高通量测序reads 的快速且节省内存的组装算法。

Bioinformatics. 2016 Nov 1;32(21):3215-3223. doi: 10.1093/bioinformatics/btw470. Epub 2016 Jul 13.

NeatFreq: reference-free data reduction and coverage normalization for De Novo sequence assembly.NeatFreq：用于从头序列组装的无参考数据缩减和覆盖度归一化

BMC Bioinformatics. 2014 Nov 19;15(1):357. doi: 10.1186/s12859-014-0357-3.

Integrating long-range connectivity information into de Bruijn graphs.将长程连接信息整合到 de Bruijn 图中。

Bioinformatics. 2018 Aug 1;34(15):2556-2565. doi: 10.1093/bioinformatics/bty157.

RResolver: efficient short-read repeat resolution within ABySS.RResolver：AByss 内高效的短读重复序列解决工具。

BMC Bioinformatics. 2022 Jun 21;23(1):246. doi: 10.1186/s12859-022-04790-z.

引用本文的文献

Empirical evaluation of methods for genome assembly.基因组组装方法的实证评估。

PeerJ Comput Sci. 2021 Jul 9;7:e636. doi: 10.7717/peerj-cs.636. eCollection 2021.

SIns: A Novel Insertion Detection Approach Based on Soft-Clipped Reads.SIns：一种基于软剪切读段的新型插入检测方法。

Front Genet. 2021 Apr 30;12:665812. doi: 10.3389/fgene.2021.665812. eCollection 2021.

RepAHR: an improved approach for de novo repeat identification by assembly of the high-frequency reads.RepAHR：通过组装高频读段进行从头鉴定重复序列的改进方法。

BMC Bioinformatics. 2020 Oct 19;21(1):463. doi: 10.1186/s12859-020-03779-w.

LROD: An Overlap Detection Algorithm for Long Reads Based on -mer Distribution.LROD：一种基于-mer分布的长读段重叠检测算法。

Front Genet. 2020 Jul 29;11:632. doi: 10.3389/fgene.2020.00632. eCollection 2020.

MAC: Merging Assemblies by Using Adjacency Algebraic Model and Classification.MAC：基于邻接代数模型和分类的装配合并

Front Genet. 2020 Jan 31;10:1396. doi: 10.3389/fgene.2019.01396. eCollection 2019.

SLR: a scaffolding algorithm based on long reads and contig classification.SLR：一种基于长读段和重叠群分类的支架算法。

BMC Bioinformatics. 2019 Oct 30;20(1):539. doi: 10.1186/s12859-019-3114-9.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

EPGA2：内存高效的从头组装器。

EPGA2: memory-efficient de novo assembler.

作者信息

机构信息

出版信息

MOTIVATION

RESULTS

AVAILABILITY AND IMPLEMENTATION

动机

结果

可用性和实现

相似文献

引用本文的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献