• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

SWAP-Assembler:面向数千核的可扩展且高效的基因组组装。

SWAP-Assembler: scalable and efficient genome assembly towards thousands of cores.

出版信息

BMC Bioinformatics. 2014;15 Suppl 9(Suppl 9):S2. doi: 10.1186/1471-2105-15-S9-S2. Epub 2014 Sep 10.

DOI:10.1186/1471-2105-15-S9-S2
PMID:25253533
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4168705/
Abstract

BACKGROUND

There is a widening gap between the throughput of massive parallel sequencing machines and the ability to analyze these sequencing data. Traditional assembly methods requiring long execution time and large amount of memory on a single workstation limit their use on these massive data.

RESULTS

This paper presents a highly scalable assembler named as SWAP-Assembler for processing massive sequencing data using thousands of cores, where SWAP is an acronym for Small World Asynchronous Parallel model. In the paper, a mathematical description of multi-step bi-directed graph (MSG) is provided to resolve the computational interdependence on merging edges, and a highly scalable computational framework for SWAP is developed to automatically preform the parallel computation of all operations. Graph cleaning and contig extension are also included for generating contigs with high quality. Experimental results show that SWAP-Assembler scales up to 2048 cores on Yanhuang dataset using only 26 minutes, which is better than several other parallel assemblers, such as ABySS, Ray, and PASHA. Results also show that SWAP-Assembler can generate high quality contigs with good N50 size and low error rate, especially it generated the longest N50 contig sizes for Fish and Yanhuang datasets.

CONCLUSIONS

In this paper, we presented a highly scalable and efficient genome assembly software, SWAP-Assembler. Compared with several other assemblers, it showed very good performance in terms of scalability and contig quality. This software is available at: https://sourceforge.net/projects/swapassembler.

摘要

背景

高通量并行测序仪的产出与分析这些测序数据的能力之间存在着越来越大的差距。传统的组装方法需要在单个工作站上执行很长的时间和消耗大量的内存,这限制了它们在这些大规模数据上的使用。

结果

本文提出了一种名为 SWAP-Assembler 的高度可扩展的组装器,用于使用数千个内核处理大规模测序数据,其中 SWAP 是小世界异步并行模型的缩写。本文提供了多步双向图 (MSG) 的数学描述,以解决在合并边时的计算相关性,并开发了一种高度可扩展的 SWAP 计算框架,用于自动执行所有操作的并行计算。还包括图形清理和重叠群扩展,以生成高质量的重叠群。实验结果表明,SWAP-Assembler 在使用 26 分钟的情况下,在 Yanhuang 数据集上可扩展到 2048 个内核,优于其他一些并行组装器,如 ABySS、Ray 和 PASHA。结果还表明,SWAP-Assembler 可以生成高质量的重叠群,具有良好的 N50 大小和低错误率,特别是它为 Fish 和 Yanhuang 数据集生成了最长的 N50 重叠群大小。

结论

本文提出了一种高度可扩展和高效的基因组组装软件 SWAP-Assembler。与其他一些组装器相比,它在可扩展性和重叠群质量方面表现出了非常好的性能。该软件可在以下网址获得:https://sourceforge.net/projects/swapassembler。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ac06/4168705/e8d510976eff/1471-2105-15-S9-S2-11.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ac06/4168705/a86b7360674c/1471-2105-15-S9-S2-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ac06/4168705/1f06f4829ff3/1471-2105-15-S9-S2-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ac06/4168705/e42e09cfd4d8/1471-2105-15-S9-S2-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ac06/4168705/028014d9c043/1471-2105-15-S9-S2-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ac06/4168705/b98be90dcd1b/1471-2105-15-S9-S2-5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ac06/4168705/1d424b53855d/1471-2105-15-S9-S2-6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ac06/4168705/68d06df70510/1471-2105-15-S9-S2-7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ac06/4168705/fc27d7f8f550/1471-2105-15-S9-S2-8.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ac06/4168705/6e16b3afd40e/1471-2105-15-S9-S2-9.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ac06/4168705/4e025b132732/1471-2105-15-S9-S2-10.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ac06/4168705/e8d510976eff/1471-2105-15-S9-S2-11.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ac06/4168705/a86b7360674c/1471-2105-15-S9-S2-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ac06/4168705/1f06f4829ff3/1471-2105-15-S9-S2-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ac06/4168705/e42e09cfd4d8/1471-2105-15-S9-S2-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ac06/4168705/028014d9c043/1471-2105-15-S9-S2-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ac06/4168705/b98be90dcd1b/1471-2105-15-S9-S2-5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ac06/4168705/1d424b53855d/1471-2105-15-S9-S2-6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ac06/4168705/68d06df70510/1471-2105-15-S9-S2-7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ac06/4168705/fc27d7f8f550/1471-2105-15-S9-S2-8.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ac06/4168705/6e16b3afd40e/1471-2105-15-S9-S2-9.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ac06/4168705/4e025b132732/1471-2105-15-S9-S2-10.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ac06/4168705/e8d510976eff/1471-2105-15-S9-S2-11.jpg

相似文献

1
SWAP-Assembler: scalable and efficient genome assembly towards thousands of cores.SWAP-Assembler:面向数千核的可扩展且高效的基因组组装。
BMC Bioinformatics. 2014;15 Suppl 9(Suppl 9):S2. doi: 10.1186/1471-2105-15-S9-S2. Epub 2014 Sep 10.
2
Benchmarking of de novo assembly algorithms for Nanopore data reveals optimal performance of OLC approaches.用于纳米孔数据的从头组装算法基准测试揭示了重叠布局一致(OLC)方法的最佳性能。
BMC Genomics. 2016 Aug 22;17 Suppl 7(Suppl 7):507. doi: 10.1186/s12864-016-2895-8.
3
FastEtch: A Fast Sketch-Based Assembler for Genomes.FastEtch:一种基于草图的快速基因组装配器。
IEEE/ACM Trans Comput Biol Bioinform. 2019 Jul-Aug;16(4):1091-1106. doi: 10.1109/TCBB.2017.2737999. Epub 2017 Sep 11.
4
Scalable De Novo Genome Assembly Using a Pregel-Like Graph-Parallel System.基于图并行的 Pre-gel 样系统的可扩展从头基因组组装。
IEEE/ACM Trans Comput Biol Bioinform. 2021 Mar-Apr;18(2):731-744. doi: 10.1109/TCBB.2019.2920912. Epub 2021 Apr 6.
5
A scalable and accurate targeted gene assembly tool (SAT-Assembler) for next-generation sequencing data.一种用于下一代测序数据的可扩展且准确的靶向基因组装工具(SAT组装器)。
PLoS Comput Biol. 2014 Aug 14;10(8):e1003737. doi: 10.1371/journal.pcbi.1003737. eCollection 2014 Aug.
6
Clover: a clustering-oriented de novo assembler for Illumina sequences.Clover:一款面向聚类的 Illumina 序列从头组装程序。
BMC Bioinformatics. 2020 Nov 17;21(1):528. doi: 10.1186/s12859-020-03788-9.
7
RResolver: efficient short-read repeat resolution within ABySS.RResolver:AByss 内高效的短读重复序列解决工具。
BMC Bioinformatics. 2022 Jun 21;23(1):246. doi: 10.1186/s12859-022-04790-z.
8
LMAS: evaluating metagenomic short de novo assembly methods through defined communities.LMAS:通过定义的群落评估宏基因组短从头组装方法。
Gigascience. 2022 Dec 28;12. doi: 10.1093/gigascience/giac122.
9
Assembler for de novo assembly of large genomes.从头组装大型基因组的装配器。
Proc Natl Acad Sci U S A. 2013 Sep 3;110(36):E3417-24. doi: 10.1073/pnas.1314090110. Epub 2013 Aug 21.
10
Parallelized short read assembly of large genomes using de Bruijn graphs.使用 de Bruijn 图进行大型基因组的并行短读序列组装。
BMC Bioinformatics. 2011 Aug 25;12:354. doi: 10.1186/1471-2105-12-354.

引用本文的文献

1
Cloud Computing Enabled Big Multi-Omics Data Analytics.基于云计算的大型多组学数据分析
Bioinform Biol Insights. 2021 Jul 28;15:11779322211035921. doi: 10.1177/11779322211035921. eCollection 2021.
2
Bioinformatics applications on Apache Spark.基于 Apache Spark 的生物信息学应用。
Gigascience. 2018 Aug 1;7(8):giy098. doi: 10.1093/gigascience/giy098.
3
Compacting de Bruijn graphs from sequencing data quickly and in low memory.从测序数据中快速且低内存地压缩德布鲁因图。

本文引用的文献

1
Assemblathon 2: evaluating de novo methods of genome assembly in three vertebrate species.Assemblathon2:在三个脊椎动物物种中评估从头组装基因组方法。
Gigascience. 2013 Jul 22;2(1):10. doi: 10.1186/2047-217X-2-10.
2
De Bruijn Superwalk with Multiplicities Problem is NP-hard.带多重性问题的德布鲁因超漫步是 NP 难问题。
BMC Bioinformatics. 2013;14 Suppl 5(Suppl 5):S7. doi: 10.1186/1471-2105-14-S5-S7. Epub 2013 Apr 10.
3
SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler.SOAPdenovo2:一种经验丰富的、内存效率高的短读长从头组装器。
Bioinformatics. 2016 Jun 15;32(12):i201-i208. doi: 10.1093/bioinformatics/btw279.
4
A novel semi-supervised algorithm for the taxonomic assignment of metagenomic reads.一种用于宏基因组 reads 分类归属的新型半监督算法。
BMC Bioinformatics. 2016 Jan 6;17:22. doi: 10.1186/s12859-015-0872-x.
Gigascience. 2012 Dec 27;1(1):18. doi: 10.1186/2047-217X-1-18.
4
SOAP3: ultra-fast GPU-based parallel alignment tool for short reads.SOAP3:基于 GPU 的超快速短读序列并行比对工具。
Bioinformatics. 2012 Mar 15;28(6):878-9. doi: 10.1093/bioinformatics/bts061. Epub 2012 Jan 28.
5
GAGE: A critical evaluation of genome assemblies and assembly algorithms.盖奇:基因组组装和算法的关键评估。
Genome Res. 2012 Mar;22(3):557-67. doi: 10.1101/gr.131383.111. Epub 2012 Jan 6.
6
Parallelized short read assembly of large genomes using de Bruijn graphs.使用 de Bruijn 图进行大型基因组的并行短读序列组装。
BMC Bioinformatics. 2011 Aug 25;12:354. doi: 10.1186/1471-2105-12-354.
7
High-quality draft assemblies of mammalian genomes from massively parallel sequence data.利用大规模平行测序数据生成高质量的哺乳动物基因组草图组装。
Proc Natl Acad Sci U S A. 2011 Jan 25;108(4):1513-8. doi: 10.1073/pnas.1017351108. Epub 2010 Dec 27.
8
Quake: quality-aware detection and correction of sequencing errors.Quake:测序错误的质量感知检测和校正。
Genome Biol. 2010;11(11):R116. doi: 10.1186/gb-2010-11-11-r116. Epub 2010 Nov 29.
9
Ray: simultaneous assembly of reads from a mix of high-throughput sequencing technologies.雷:同时组装来自多种高通量测序技术的读数。
J Comput Biol. 2010 Nov;17(11):1519-33. doi: 10.1089/cmb.2009.0238. Epub 2010 Oct 20.
10
Cloud-scale RNA-sequencing differential expression analysis with Myrna.利用 Myrna 进行云规模 RNA-seq 差异表达分析。
Genome Biol. 2010;11(8):R83. doi: 10.1186/gb-2010-11-8-r83. Epub 2010 Aug 11.