• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

Konnector v2.0:来自双端测序数据的伪长读段

Konnector v2.0: pseudo-long reads from paired-end sequencing data.

作者信息

Vandervalk Benjamin P, Yang Chen, Xue Zhuyi, Raghavan Karthika, Chu Justin, Mohamadi Hamid, Jackman Shaun D, Chiu Readman, Warren René L, Birol Inanç

出版信息

BMC Med Genomics. 2015;8 Suppl 3(Suppl 3):S1. doi: 10.1186/1755-8794-8-S3-S1. Epub 2015 Sep 23.

DOI:10.1186/1755-8794-8-S3-S1
PMID:26399504
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4582294/
Abstract

BACKGROUND

Reading the nucleotides from two ends of a DNA fragment is called paired-end tag (PET) sequencing. When the fragment length is longer than the combined read length, there remains a gap of unsequenced nucleotides between read pairs. If the target in such experiments is sequenced at a level to provide redundant coverage, it may be possible to bridge these gaps using bioinformatics methods. Konnector is a local de novo assembly tool that addresses this problem. Here we report on version 2.0 of our tool.

RESULTS

Konnector uses a probabilistic and memory-efficient data structure called Bloom filter to represent a k-mer spectrum - all possible sequences of length k in an input file, such as the collection of reads in a PET sequencing experiment. It performs look-ups to this data structure to construct an implicit de Bruijn graph, which describes (k-1) base pair overlaps between adjacent k-mers. It traverses this graph to bridge the gap between a given pair of flanking sequences.

CONCLUSIONS

Here we report the performance of Konnector v2.0 on simulated and experimental datasets, and compare it against other tools with similar functionality. We note that, representing k-mers with 1.5 bytes of memory on average, Konnector can scale to very large genomes. With our parallel implementation, it can also process over a billion bases on commodity hardware.

摘要

背景

从DNA片段的两端读取核苷酸被称为双末端标签(PET)测序。当片段长度长于组合读取长度时,读取对之间会存在未测序核苷酸的间隙。如果此类实验中的目标测序达到提供冗余覆盖的水平,那么使用生物信息学方法有可能填补这些间隙。Konnector是一种解决此问题的本地从头组装工具。在此我们报告该工具的2.0版本。

结果

Konnector使用一种名为布隆过滤器的概率性且内存高效的数据结构来表示k-mer频谱——输入文件中所有长度为k的可能序列,例如PET测序实验中的读取集合。它对该数据结构进行查找以构建一个隐式德布鲁因图,该图描述相邻k-mer之间的(k-1)碱基对重叠。它遍历此图以填补给定一对侧翼序列之间的间隙。

结论

在此我们报告Konnector v2.0在模拟和实验数据集上的性能,并将其与其他具有类似功能的工具进行比较。我们注意到,Konnector平均用1.5字节内存表示k-mer,能够扩展到非常大的基因组。通过我们的并行实现,它还能在商用硬件上处理超过十亿个碱基。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3d71/4582294/b534b88f58a1/1755-8794-8-S3-S1-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3d71/4582294/a60f4f36b7fb/1755-8794-8-S3-S1-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3d71/4582294/821b196941e8/1755-8794-8-S3-S1-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3d71/4582294/5bc4b2d4669d/1755-8794-8-S3-S1-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3d71/4582294/b534b88f58a1/1755-8794-8-S3-S1-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3d71/4582294/a60f4f36b7fb/1755-8794-8-S3-S1-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3d71/4582294/821b196941e8/1755-8794-8-S3-S1-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3d71/4582294/5bc4b2d4669d/1755-8794-8-S3-S1-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3d71/4582294/b534b88f58a1/1755-8794-8-S3-S1-4.jpg

相似文献

1
Konnector v2.0: pseudo-long reads from paired-end sequencing data.Konnector v2.0:来自双端测序数据的伪长读段
BMC Med Genomics. 2015;8 Suppl 3(Suppl 3):S1. doi: 10.1186/1755-8794-8-S3-S1. Epub 2015 Sep 23.
2
BASE: a practical de novo assembler for large genomes using long NGS reads.BASE:一种使用长读长二代测序数据进行大型基因组从头组装的实用工具。
BMC Genomics. 2016 Aug 31;17 Suppl 5(Suppl 5):499. doi: 10.1186/s12864-016-2829-5.
3
RResolver: efficient short-read repeat resolution within ABySS.RResolver:AByss 内高效的短读重复序列解决工具。
BMC Bioinformatics. 2022 Jun 21;23(1):246. doi: 10.1186/s12859-022-04790-z.
4
Pseudo-Sanger sequencing: massively parallel production of long and near error-free reads using NGS technology.伪桑格测序:使用下一代测序(NGS)技术大规模并行产生长且近乎无错误的 reads。
BMC Genomics. 2013 Oct 17;14(1):711. doi: 10.1186/1471-2164-14-711.
5
AlienTrimmer: a tool to quickly and accurately trim off multiple short contaminant sequences from high-throughput sequencing reads.AlienTrimmer:一种快速准确地从高通量测序读取中修剪掉多个短污染序列的工具。
Genomics. 2013 Nov-Dec;102(5-6):500-6. doi: 10.1016/j.ygeno.2013.07.011. Epub 2013 Aug 1.
6
Turtle: identifying frequent k-mers with cache-efficient algorithms.海龟:使用缓存高效算法识别频繁的 k-mer。
Bioinformatics. 2014 Jul 15;30(14):1950-7. doi: 10.1093/bioinformatics/btu132. Epub 2014 Mar 10.
7
Toward perfect reads: self-correction of short reads via mapping on de Bruijn graphs.迈向完美读段:通过在 De Bruijn 图上进行映射来自我纠正短读段。
Bioinformatics. 2020 Mar 1;36(5):1374-1381. doi: 10.1093/bioinformatics/btz102.
8
Compact representation of k-mer de Bruijn graphs for genome read assembly.用于基因组读取组装的 k-mer de Bruijn 图的紧凑表示。
BMC Bioinformatics. 2013 Oct 23;14:313. doi: 10.1186/1471-2105-14-313.
9
A space and time-efficient index for the compacted colored de Bruijn graph.一种用于压缩彩色 de Bruijn 图的空间和时间高效索引。
Bioinformatics. 2018 Jul 1;34(13):i169-i177. doi: 10.1093/bioinformatics/bty292.
10
Sealer: a scalable gap-closing application for finishing draft genomes.Sealer:一种用于完成草图基因组的可扩展缺口闭合应用程序。
BMC Bioinformatics. 2015 Jul 25;16(1):230. doi: 10.1186/s12859-015-0663-4.

引用本文的文献

1
CAREx: context-aware read extension of paired-end sequencing data.CAREx:基于上下文感知的配对末端测序数据扩展。
BMC Bioinformatics. 2024 May 10;25(1):186. doi: 10.1186/s12859-024-05802-w.
2
Genome sequencing and metabolic network reconstruction of a novel sulfur-oxidizing bacterium .一种新型硫氧化细菌的基因组测序与代谢网络重建
Front Microbiol. 2023 Nov 20;14:1277847. doi: 10.3389/fmicb.2023.1277847. eCollection 2023.
3
Assembly and annotation of the black spruce genome provide insights on spruce phylogeny and evolution of stress response.

本文引用的文献

1
Sealer: a scalable gap-closing application for finishing draft genomes.Sealer:一种用于完成草图基因组的可扩展缺口闭合应用程序。
BMC Bioinformatics. 2015 Jul 25;16(1):230. doi: 10.1186/s12859-015-0663-4.
2
Space-efficient and exact de Bruijn graph representation based on a Bloom filter.基于布隆过滤器的空间高效且精确的德布鲁因图表示。
Algorithms Mol Biol. 2013 Sep 16;8(1):22. doi: 10.1186/1748-7188-8-22.
3
The MaSuRCA genome assembler.马苏尔卡基因组组装器。
黑松基因组的组装和注释为云杉系统发育和应激反应进化提供了新见解。
G3 (Bethesda). 2023 Dec 29;14(1). doi: 10.1093/g3journal/jkad247.
4
RResolver: efficient short-read repeat resolution within ABySS.RResolver:AByss 内高效的短读重复序列解决工具。
BMC Bioinformatics. 2022 Jun 21;23(1):246. doi: 10.1186/s12859-022-04790-z.
5
Generation and application of pseudo-long reads for metagenome assembly.用于宏基因组组装的伪长读的生成和应用。
Gigascience. 2022 May 17;11. doi: 10.1093/gigascience/giac044.
6
Identification of isolated or mixed strains from long reads: a challenge met on using a MinION sequencer.从长读段中鉴定分离或混合菌株:MinION 测序仪上的挑战应对。
Microb Genom. 2021 Nov;7(11). doi: 10.1099/mgen.0.000654.
7
Rapid Parallel Adaptation to Anthropogenic Heavy Metal Pollution.快速适应人为重金属污染。
Mol Biol Evol. 2021 Aug 23;38(9):3724-3736. doi: 10.1093/molbev/msab141.
8
EXFI: Exon and splice graph prediction without a reference genome.EXFI:无需参考基因组的外显子和剪接图预测。
Ecol Evol. 2020 Jul 28;10(16):8880-8893. doi: 10.1002/ece3.6587. eCollection 2020 Aug.
9
riboSeed: leveraging prokaryotic genomic architecture to assemble across ribosomal regions.riboSeed:利用原核基因组结构跨越核糖体区域进行组装。
Nucleic Acids Res. 2018 Jun 20;46(11):e68. doi: 10.1093/nar/gky212.
10
ChopStitch: exon annotation and splice graph construction using transcriptome assembly and whole genome sequencing data.ChopStitch:使用转录组组装和全基因组测序数据进行外显子注释和剪接图构建。
Bioinformatics. 2018 May 15;34(10):1697-1704. doi: 10.1093/bioinformatics/btx839.
Bioinformatics. 2013 Nov 1;29(21):2669-77. doi: 10.1093/bioinformatics/btt476. Epub 2013 Aug 29.
4
Assembling the 20 Gb white spruce (Picea glauca) genome from whole-genome shotgun sequencing data.利用全基因组鸟枪法测序数据组装 20 Gb 白云杉(Picea glauca)基因组。
Bioinformatics. 2013 Jun 15;29(12):1492-7. doi: 10.1093/bioinformatics/btt178. Epub 2013 May 22.
5
RSVSim: an R/Bioconductor package for the simulation of structural variations.RSVSim:一个用于模拟结构变异的 R/Bioconductor 包。
Bioinformatics. 2013 Jul 1;29(13):1679-81. doi: 10.1093/bioinformatics/btt198. Epub 2013 Apr 25.
6
ELOPER: elongation of paired-end reads as a pre-processing tool for improved de novo genome assembly.ELOPER: 作为一种提高从头基因组组装质量的预处理工具,使用配对末端 reads 的延伸。
Bioinformatics. 2013 Jun 1;29(11):1455-7. doi: 10.1093/bioinformatics/btt169. Epub 2013 Apr 19.
7
SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler.SOAPdenovo2:一种经验丰富的、内存效率高的短读长从头组装器。
Gigascience. 2012 Dec 27;1(1):18. doi: 10.1186/2047-217X-1-18.
8
QUAST: quality assessment tool for genome assemblies.QUAST:基因组组装质量评估工具。
Bioinformatics. 2013 Apr 15;29(8):1072-5. doi: 10.1093/bioinformatics/btt086. Epub 2013 Feb 19.
9
GapFiller: a de novo assembly approach to fill the gap within paired reads.GapFiller:一种从头开始的组装方法,用于填补配对读取中的缺口。
BMC Bioinformatics. 2012;13 Suppl 14(Suppl 14):S8. doi: 10.1186/1471-2105-13-S14-S8. Epub 2012 Sep 7.
10
COPE: an accurate k-mer-based pair-end reads connection tool to facilitate genome assembly.COPE:一种基于精确 k-mer 的双端 reads 连接工具,可方便基因组组装。
Bioinformatics. 2012 Nov 15;28(22):2870-4. doi: 10.1093/bioinformatics/bts563. Epub 2012 Oct 8.