• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

PTPan--克服引物/探针设计中寡核苷酸序列匹配的记忆限制。

PTPan--overcoming memory limitations in oligonucleotide string matching for primer/probe design.

机构信息

Department of Informatics, Technische Universität München, Boltzmannstrasse 3, 85748 Garching, Germany.

出版信息

Bioinformatics. 2011 Oct 15;27(20):2797-805. doi: 10.1093/bioinformatics/btr483. Epub 2011 Aug 19.

DOI:10.1093/bioinformatics/btr483
PMID:21856736
Abstract

MOTIVATION

Nucleic acid diagnostics has high demands for non-heuristic exact and approximate oligonucleotide string matching concerning in silico primer/probe design in huge nucleic acid sequence collections. Unfortunately, public sequence repositories grow much faster than computer hardware performance and main memory capacity do. This growth imposes severe problems on existing oligonucleotide primer/probe design applications necessitating new approaches based on space-efficient indexing structures.

RESULTS

We developed PTPan (spoken Peter Pan, 'PT' is for Position Tree, the earlier name of suffix trees), a space-efficient indexing structure for approximate oligonucleotide string matching in nucleic acid sequence data. Based on suffix trees, it combines partitioning, truncation and a new suffix tree stream compression to deal with large amounts of aligned and unaligned data. PTPan operates efficiently in main memory and on secondary storage, balancing between memory consumption and runtime during construction and application. Based on PTPan, applications supporting similarity search and primer/probe design have been implemented, namely FindFamily, ProbeMatch and ProbeDesign. All three use a weighted Levenshtein distance metric for approximative queries to find and rate matches with indels as well as substitutions. We integrated PTPan in the worldwide used software package ARB to demonstrate usability and performance. Comparing PTPan and the original ARB index for the very large ssu-rRNA database SILVA, we recognized a shorter construction time, extended functionality and dramatically reduced memory requirements at the price of expanded, but very reasonable query times. PTPan enables indexing of huge nucleic acid sequence collections at reasonable application response times. Not being limited by main memory, PTPan constitutes a major advancement regarding rapid oligonucleotide string matching in primer/probe design now and in the future facing the enormous growth of molecular sequence data.

AVAILABILITY

Supplementary Material, PTPan stand-alone library and ARB-PTPan binary on http://ptpan.lrr.in.tum.de/.

CONTACT

meierh@in.tum.de

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

核酸诊断在计算机引物/探针设计中对非启发式精确和近似寡核苷酸字符串匹配具有很高的要求,特别是在大规模核酸序列集合中。不幸的是,公共序列存储库的增长速度远远超过计算机硬件性能和主内存容量的增长速度。这种增长给现有的寡核苷酸引物/探针设计应用程序带来了严重的问题,需要基于节省空间的索引结构的新方法。

结果

我们开发了 PTPan(发音为 Peter Pan,'PT' 是 Position Tree 的缩写,是后缀树的早期名称),这是一种用于核酸序列数据中近似寡核苷酸字符串匹配的节省空间的索引结构。它基于后缀树,结合了分区、截断和新的后缀树流压缩技术,以处理大量对齐和未对齐的数据。PTPan 在主内存和辅助存储中都能高效运行,在构建和应用过程中在内存消耗和运行时之间取得平衡。基于 PTPan,我们实现了支持相似性搜索和引物/探针设计的应用程序,即 FindFamily、ProbeMatch 和 ProbeDesign。这三个应用程序都使用加权的 Levenshtein 距离度量来进行近似查询,以找到并对带有插入和替换的匹配进行评分。我们将 PTPan 集成到全球使用的软件包 ARB 中,以展示其可用性和性能。通过比较 PTPan 和原始 ARB 索引在非常大的 ssu-rRNA 数据库 SILVA 上的表现,我们发现构建时间更短,功能更扩展,而查询时间也略有增加,但非常合理,同时还大大减少了内存需求。PTPan 使大规模核酸序列集合的索引在合理的应用程序响应时间内成为可能。由于不受主内存的限制,PTPan 是在面对分子序列数据的巨大增长时,在引物/探针设计中的快速寡核苷酸字符串匹配方面的一个重大进展。

可用性

补充材料、PTPan 独立库和 ARB-PTPan 二进制文件可在 http://ptpan.lrr.in.tum.de/ 上获得。

联系方式

meierh@in.tum.de

补充信息

补充数据可在 Bioinformatics 在线获得。

相似文献

1
PTPan--overcoming memory limitations in oligonucleotide string matching for primer/probe design.PTPan--克服引物/探针设计中寡核苷酸序列匹配的记忆限制。
Bioinformatics. 2011 Oct 15;27(20):2797-805. doi: 10.1093/bioinformatics/btr483. Epub 2011 Aug 19.
2
Comprehensive and relaxed search for oligonucleotide signatures in hierarchically clustered sequence datasets.系统聚类序列数据集上寡核苷酸特征的全面宽松搜索。
Bioinformatics. 2011 Jun 1;27(11):1546-54. doi: 10.1093/bioinformatics/btr161. Epub 2011 Apr 5.
3
Graphical representation of ribosomal RNA probe accessibility data using ARB software package.使用ARB软件包对核糖体RNA探针可及性数据进行图形化表示。
BMC Bioinformatics. 2005 Mar 21;6:61. doi: 10.1186/1471-2105-6-61.
4
Evaluation of sequence alignments and oligonucleotide probes with respect to three-dimensional structure of ribosomal RNA using ARB software package.使用ARB软件包评估核糖体RNA三维结构的序列比对和寡核苷酸探针。
BMC Bioinformatics. 2006 May 4;7:240. doi: 10.1186/1471-2105-7-240.
5
A space and time-efficient index for the compacted colored de Bruijn graph.一种用于压缩彩色 de Bruijn 图的空间和时间高效索引。
Bioinformatics. 2018 Jul 1;34(13):i169-i177. doi: 10.1093/bioinformatics/bty292.
6
probeBase--an online resource for rRNA-targeted oligonucleotide probes and primers: new features 2016.probeBase——一个针对rRNA的寡核苷酸探针和引物的在线资源:2016年新特性
Nucleic Acids Res. 2016 Jan 4;44(D1):D586-9. doi: 10.1093/nar/gkv1232. Epub 2015 Nov 19.
7
Readjoiner: a fast and memory efficient string graph-based sequence assembler.Readjoiner:一种快速且内存高效的基于字符串图的序列拼接器。
BMC Bioinformatics. 2012 May 6;13:82. doi: 10.1186/1471-2105-13-82.
8
A space-efficient construction of the Burrows-Wheeler transform for genomic data.一种用于基因组数据的布罗-惠勒变换的节省空间的构建方法。
J Comput Biol. 2005 Sep;12(7):943-51. doi: 10.1089/cmb.2005.12.943.
9
sBWT: memory efficient implementation of the hardware-acceleration-friendly Schindler transform for the fast biological sequence mapping.sBWT:用于快速生物序列映射的对硬件加速友好的辛德勒变换的内存高效实现。
Bioinformatics. 2016 Nov 15;32(22):3498-3500. doi: 10.1093/bioinformatics/btw419. Epub 2016 Jul 13.
10
probeBase: an online resource for rRNA-targeted oligonucleotide probes.探针库:一个针对核糖体RNA的寡核苷酸探针的在线资源。
Nucleic Acids Res. 2003 Jan 1;31(1):514-6. doi: 10.1093/nar/gkg016.

引用本文的文献

1
An algorithm of discovering signatures from DNA databases on a computer cluster.一种在计算机集群上从DNA数据库中发现特征序列的算法。
BMC Bioinformatics. 2014 Oct 5;15(1):339. doi: 10.1186/1471-2105-15-339.