• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

Readjoiner:一种快速且内存高效的基于字符串图的序列拼接器。

Readjoiner: a fast and memory efficient string graph-based sequence assembler.

机构信息

Center for Bioinformatics, University of Hamburg, Bundesstrasse 43, 20146 Hamburg, Germany.

出版信息

BMC Bioinformatics. 2012 May 6;13:82. doi: 10.1186/1471-2105-13-82.

DOI:10.1186/1471-2105-13-82
PMID:22559072
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3507659/
Abstract

BACKGROUND

Ongoing improvements in throughput of the next-generation sequencing technologies challenge the current generation of de novo sequence assemblers. Most recent sequence assemblers are based on the construction of a de Bruijn graph. An alternative framework of growing interest is the assembly string graph, not necessitating a division of the reads into k-mers, but requiring fast algorithms for the computation of suffix-prefix matches among all pairs of reads.

RESULTS

Here we present efficient methods for the construction of a string graph from a set of sequencing reads. Our approach employs suffix sorting and scanning methods to compute suffix-prefix matches. Transitive edges are recognized and eliminated early in the process and the graph is efficiently constructed including irreducible edges only.

CONCLUSIONS

Our suffix-prefix match determination and string graph construction algorithms have been implemented in the software package Readjoiner. Comparison with existing string graph-based assemblers shows that Readjoiner is faster and more space efficient. Readjoiner is available at http://www.zbh.uni-hamburg.de/readjoiner.

摘要

背景

新一代测序技术的通量不断提高,这给新一代从头序列组装器带来了挑战。最近的序列组装器都是基于构建 de Bruijn 图。另一个越来越受到关注的框架是组装字符串图,它不需要将读取序列划分成 k-mers,而是需要快速算法来计算所有读取序列对之间的后缀-前缀匹配。

结果

本文提出了一种从一组测序读取中构建字符串图的有效方法。我们的方法采用后缀排序和扫描方法来计算后缀-前缀匹配。在处理过程中,会及早识别和消除传递边,并仅构建有效的不可约边。

结论

我们的后缀-前缀匹配确定和字符串图构建算法已经在 Readjoiner 软件包中实现。与现有的基于字符串图的组装器的比较表明,Readjoiner 更快、更节省空间。Readjoiner 可在 http://www.zbh.uni-hamburg.de/readjoiner 获得。

相似文献

1
Readjoiner: a fast and memory efficient string graph-based sequence assembler.Readjoiner:一种快速且内存高效的基于字符串图的序列拼接器。
BMC Bioinformatics. 2012 May 6;13:82. doi: 10.1186/1471-2105-13-82.
2
FSG: Fast String Graph Construction for De Novo Assembly.FSG:用于从头组装的快速字符串图构建
J Comput Biol. 2017 Oct;24(10):953-968. doi: 10.1089/cmb.2017.0089. Epub 2017 Jul 17.
3
Efficient construction of an assembly string graph using the FM-index.利用 FM 索引高效构建组装字符串图。
Bioinformatics. 2010 Jun 15;26(12):i367-73. doi: 10.1093/bioinformatics/btq217.
4
A memory-efficient data structure representing exact-match overlap graphs with application for next-generation DNA assembly.一种内存效率高的数据结构,用于表示精确匹配的重叠图,适用于下一代 DNA 组装。
Bioinformatics. 2011 Jul 15;27(14):1901-7. doi: 10.1093/bioinformatics/btr321. Epub 2011 Jun 2.
5
Omega: an overlap-graph de novo assembler for metagenomics.Omega:一种用于宏基因组学的重叠图从头组装器。
Bioinformatics. 2014 Oct;30(19):2717-22. doi: 10.1093/bioinformatics/btu395. Epub 2014 Jun 19.
6
String graph construction using incremental hashing.使用增量哈希的字符串图构建。
Bioinformatics. 2014 Dec 15;30(24):3515-23. doi: 10.1093/bioinformatics/btu578. Epub 2014 Sep 2.
7
FastEtch: A Fast Sketch-Based Assembler for Genomes.FastEtch:一种基于草图的快速基因组装配器。
IEEE/ACM Trans Comput Biol Bioinform. 2019 Jul-Aug;16(4):1091-1106. doi: 10.1109/TCBB.2017.2737999. Epub 2017 Sep 11.
8
HyDA-Vista: towards optimal guided selection of k-mer size for sequence assembly.HyDA-Vista:迈向序列组装中k-mer大小的最优引导选择
BMC Genomics. 2014;15 Suppl 10(Suppl 10):S9. doi: 10.1186/1471-2164-15-S10-S9. Epub 2014 Dec 12.
9
Clover: a clustering-oriented de novo assembler for Illumina sequences.Clover:一款面向聚类的 Illumina 序列从头组装程序。
BMC Bioinformatics. 2020 Nov 17;21(1):528. doi: 10.1186/s12859-020-03788-9.
10
Benchmarking of de novo assembly algorithms for Nanopore data reveals optimal performance of OLC approaches.用于纳米孔数据的从头组装算法基准测试揭示了重叠布局一致(OLC)方法的最佳性能。
BMC Genomics. 2016 Aug 22;17 Suppl 7(Suppl 7):507. doi: 10.1186/s12864-016-2895-8.

引用本文的文献

1
Unlocking plant genetics with telomere-to-telomere genome assemblies.端粒到端粒基因组组装解锁植物遗传学。
Nat Genet. 2024 Sep;56(9):1788-1799. doi: 10.1038/s41588-024-01830-7. Epub 2024 Jul 24.
2
Genome-scale de novo assembly using ALGA.使用ALGA进行基因组规模的从头组装。
Bioinformatics. 2021 Jul 19;37(12):1644-1651. doi: 10.1093/bioinformatics/btab005.
3
TAR-VIR: a pipeline for TARgeted VIRal strain reconstruction from metagenomic data.TAR-VIR:一种从宏基因组数据中重建 TARgeted VIRal 株的管道。

本文引用的文献

1
Plantagora: modeling whole genome sequencing and assembly of plant genomes.植物基因组测序和组装的模式。
PLoS One. 2011;6(12):e28436. doi: 10.1371/journal.pone.0028436. Epub 2011 Dec 12.
2
Efficient de novo assembly of large genomes using compressed data structures.利用压缩数据结构进行高效的从头基因组组装。
Genome Res. 2012 Mar;22(3):549-56. doi: 10.1101/gr.126953.111. Epub 2011 Dec 7.
3
A new efficient data structure for storage and retrieval of multiple biosequences.一种用于存储和检索多个生物序列的新高效数据结构。
BMC Bioinformatics. 2019 Jun 4;20(1):305. doi: 10.1186/s12859-019-2878-2.
4
Improving the sensitivity of long read overlap detection using grouped short k-mer matches.利用分组短 k-mer 匹配提高长读重叠检测的灵敏度。
BMC Genomics. 2019 Apr 4;20(Suppl 2):190. doi: 10.1186/s12864-019-5475-x.
5
Two Efficient Techniques to Find Approximate Overlaps between Sequences.两种用于查找序列间近似重叠的高效技术。
Biomed Res Int. 2017;2017:2731385. doi: 10.1155/2017/2731385. Epub 2017 Feb 15.
6
RGFA: powerful and convenient handling of assembly graphs.RGFA:强大且便捷的装配图处理工具。
PeerJ. 2016 Nov 8;4:e2681. doi: 10.7717/peerj.2681. eCollection 2016.
7
BASE: a practical de novo assembler for large genomes using long NGS reads.BASE:一种使用长读长二代测序数据进行大型基因组从头组装的实用工具。
BMC Genomics. 2016 Aug 31;17 Suppl 5(Suppl 5):499. doi: 10.1186/s12864-016-2829-5.
8
Reconstructing 16S rRNA genes in metagenomic data.重建宏基因组数据中的 16S rRNA 基因。
Bioinformatics. 2015 Jun 15;31(12):i35-43. doi: 10.1093/bioinformatics/btv231.
9
SeedsGraph: an efficient assembler for next-generation sequencing data.SeedsGraph:一种用于下一代测序数据的高效组装器。
BMC Med Genomics. 2015;8 Suppl 2(Suppl 2):S13. doi: 10.1186/1755-8794-8-S2-S13. Epub 2015 May 29.
10
A Practical and Scalable Tool to Find Overlaps between Sequences.一种用于查找序列间重叠部分的实用且可扩展的工具。
Biomed Res Int. 2015;2015:905261. doi: 10.1155/2015/905261. Epub 2015 Apr 19.
IEEE/ACM Trans Comput Biol Bioinform. 2012;9(2):330-44. doi: 10.1109/TCBB.2011.146. Epub 2011 Nov 10.
4
Assemblathon 1: a competitive assessment of de novo short read assembly methods.Assemblathon 1:从头开始的短读序列组装方法的竞争性评估。
Genome Res. 2011 Dec;21(12):2224-41. doi: 10.1101/gr.126599.111. Epub 2011 Sep 16.
5
A memory-efficient data structure representing exact-match overlap graphs with application for next-generation DNA assembly.一种内存效率高的数据结构,用于表示精确匹配的重叠图,适用于下一代 DNA 组装。
Bioinformatics. 2011 Jul 15;27(14):1901-7. doi: 10.1093/bioinformatics/btr321. Epub 2011 Jun 2.
6
Scaffolding pre-assembled contigs using SSPACE.使用 SSPACE 搭建预组装 contigs 的支架。
Bioinformatics. 2011 Feb 15;27(4):578-9. doi: 10.1093/bioinformatics/btq683. Epub 2010 Dec 12.
7
Quake: quality-aware detection and correction of sequencing errors.Quake:测序错误的质量感知检测和校正。
Genome Biol. 2010;11(11):R116. doi: 10.1186/gb-2010-11-11-r116. Epub 2010 Nov 29.
8
Efficient construction of an assembly string graph using the FM-index.利用 FM 索引高效构建组装字符串图。
Bioinformatics. 2010 Jun 15;26(12):i367-73. doi: 10.1093/bioinformatics/btq217.
9
Next-generation gap.下一代差距。
Nat Methods. 2009 Nov;6(11 Suppl):S2-5. doi: 10.1038/nmeth.f.268.
10
ABySS: a parallel assembler for short read sequence data.ABySS:一种用于短读长序列数据的并行汇编器。
Genome Res. 2009 Jun;19(6):1117-23. doi: 10.1101/gr.089532.108. Epub 2009 Feb 27.