• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

LightAssembler:一种用于高通量测序reads 的快速且节省内存的组装算法。

LightAssembler: fast and memory-efficient assembly algorithm for high-throughput sequencing reads.

机构信息

Molecular and Computational Biology, University of Southern California, Los Angeles, CA 90089, USA Department of Computer Science, Faculty of Computers and Information, Mansoura University, Mansoura 35516, Egypt.

Department of Computer Science, Faculty of Computers and Information, Mansoura University, Mansoura 35516, Egypt.

出版信息

Bioinformatics. 2016 Nov 1;32(21):3215-3223. doi: 10.1093/bioinformatics/btw470. Epub 2016 Jul 13.

DOI:10.1093/bioinformatics/btw470
PMID:27412092
Abstract

MOTIVATION

The deluge of current sequenced data has exceeded Moore's Law, more than doubling every 2 years since the next-generation sequencing (NGS) technologies were invented. Accordingly, we will able to generate more and more data with high speed at fixed cost, but lack the computational resources to store, process and analyze it. With error prone high throughput NGS reads and genomic repeats, the assembly graph contains massive amount of redundant nodes and branching edges. Most assembly pipelines require this large graph to reside in memory to start their workflows, which is intractable for mammalian genomes. Resource-efficient genome assemblers combine both the power of advanced computing techniques and innovative data structures to encode the assembly graph efficiently in a computer memory.

RESULTS

LightAssembler is a lightweight assembly algorithm designed to be executed on a desktop machine. It uses a pair of cache oblivious Bloom filters, one holding a uniform sample of [Formula: see text]-spaced sequenced [Formula: see text]-mers and the other holding [Formula: see text]-mers classified as likely correct, using a simple statistical test. LightAssembler contains a light implementation of the graph traversal and simplification modules that achieves comparable assembly accuracy and contiguity to other competing tools. Our method reduces the memory usage by [Formula: see text] compared to the resource-efficient assemblers using benchmark datasets from GAGE and Assemblathon projects. While LightAssembler can be considered as a gap-based sequence assembler, different gap sizes result in an almost constant assembly size and genome coverage.

AVAILABILITY AND IMPLEMENTATION

https://github.com/SaraEl-Metwally/LightAssembler CONTACT: sarah_almetwally4@mans.edu.egSupplementary information: Supplementary data are available at Bioinformatics online.

摘要

动机

自下一代测序 (NGS) 技术发明以来,当前测序数据的洪流已经超过了摩尔定律,每两年翻一番还不止。因此,我们将能够以固定成本以更高的速度生成越来越多的数据,但缺乏存储、处理和分析数据的计算资源。由于高通量 NGS 读取和基因组重复存在错误,组装图包含大量冗余节点和分支边缘。大多数组装管道都需要将这个大型图驻留在内存中才能开始其工作流程,但对于哺乳动物基因组来说,这是难以处理的。资源高效的基因组组装器结合了先进计算技术的强大功能和创新的数据结构,以便在计算机内存中有效地对组装图进行编码。

结果

LightAssembler 是一种轻量级的组装算法,旨在在台式机上执行。它使用一对无缓存感知布隆过滤器,一个持有均匀采样的 [Formula: see text]-间隔测序 [Formula: see text]-mers,另一个持有 [Formula: see text]-mers 被分类为可能正确的,使用简单的统计测试。LightAssembler 包含一个轻量级的图遍历和简化模块实现,与其他竞争工具相比,实现了相当的组装准确性和连续性。我们的方法使用 GAGE 和 Assemblathon 项目的基准数据集,与资源高效的组装器相比,内存使用量减少了 [Formula: see text]。虽然 LightAssembler 可以被认为是基于缺口的序列组装器,但不同的缺口大小导致几乎恒定的组装大小和基因组覆盖度。

可用性和实现

https://github.com/SaraEl-Metwally/LightAssembler 联系信息:sarah_almetwally4@mans.edu.eg 补充信息:补充数据可在 Bioinformatics 在线获得。

相似文献

1
LightAssembler: fast and memory-efficient assembly algorithm for high-throughput sequencing reads.LightAssembler:一种用于高通量测序reads 的快速且节省内存的组装算法。
Bioinformatics. 2016 Nov 1;32(21):3215-3223. doi: 10.1093/bioinformatics/btw470. Epub 2016 Jul 13.
2
RResolver: efficient short-read repeat resolution within ABySS.RResolver:AByss 内高效的短读重复序列解决工具。
BMC Bioinformatics. 2022 Jun 21;23(1):246. doi: 10.1186/s12859-022-04790-z.
3
Faucet: streaming de novo assembly graph construction.Faucet:从头开始的流装配图构建。
Bioinformatics. 2018 Jan 1;34(1):147-154. doi: 10.1093/bioinformatics/btx471.
4
RMI-DBG algorithm: A more agile iterative de Bruijn graph algorithm in short read genome assembly.RMI-DBG 算法:一种更灵活的迭代 de Bruijn 图算法,用于短读长基因组组装。
J Bioinform Comput Biol. 2021 Apr;19(2):2150005. doi: 10.1142/S0219720021500050. Epub 2021 Apr 16.
5
FastEtch: A Fast Sketch-Based Assembler for Genomes.FastEtch:一种基于草图的快速基因组装配器。
IEEE/ACM Trans Comput Biol Bioinform. 2019 Jul-Aug;16(4):1091-1106. doi: 10.1109/TCBB.2017.2737999. Epub 2017 Sep 11.
6
Integration of string and de Bruijn graphs for genome assembly.用于基因组组装的弦图与德布鲁因图整合
Bioinformatics. 2016 May 1;32(9):1301-7. doi: 10.1093/bioinformatics/btw011. Epub 2016 Jan 10.
7
NeatFreq: reference-free data reduction and coverage normalization for De Novo sequence assembly.NeatFreq:用于从头序列组装的无参考数据缩减和覆盖度归一化
BMC Bioinformatics. 2014 Nov 19;15(1):357. doi: 10.1186/s12859-014-0357-3.
8
A space and time-efficient index for the compacted colored de Bruijn graph.一种用于压缩彩色 de Bruijn 图的空间和时间高效索引。
Bioinformatics. 2018 Jul 1;34(13):i169-i177. doi: 10.1093/bioinformatics/bty292.
9
FSG: Fast String Graph Construction for De Novo Assembly.FSG:用于从头组装的快速字符串图构建
J Comput Biol. 2017 Oct;24(10):953-968. doi: 10.1089/cmb.2017.0089. Epub 2017 Jul 17.
10
QuorUM: An Error Corrector for Illumina Reads.QuorUM:Illumina测序读数的纠错工具
PLoS One. 2015 Jun 17;10(6):e0130821. doi: 10.1371/journal.pone.0130821. eCollection 2015.

引用本文的文献

1
Genome sequence assembly algorithms and misassembly identification methods.基因组序列组装算法和错误组装识别方法。
Mol Biol Rep. 2022 Nov;49(11):11133-11148. doi: 10.1007/s11033-022-07919-8. Epub 2022 Sep 23.
2
Empirical evaluation of methods for genome assembly.基因组组装方法的实证评估。
PeerJ Comput Sci. 2021 Jul 9;7:e636. doi: 10.7717/peerj-cs.636. eCollection 2021.
3
Faucet: streaming de novo assembly graph construction.Faucet:从头开始的流装配图构建。
Bioinformatics. 2018 Jan 1;34(1):147-154. doi: 10.1093/bioinformatics/btx471.