• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

Structator:基于快速索引的 RNA 序列-结构模式搜索。

Structator: fast index-based search for RNA sequence-structure patterns.

机构信息

Center for Bioinformatics, University of Hamburg, Bundesstrasse 43, 20146 Hamburg, Germany.

出版信息

BMC Bioinformatics. 2011 May 27;12:214. doi: 10.1186/1471-2105-12-214.

DOI:10.1186/1471-2105-12-214
PMID:21619640
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3154205/
Abstract

BACKGROUND

The secondary structure of RNA molecules is intimately related to their function and often more conserved than the sequence. Hence, the important task of searching databases for RNAs requires to match sequence-structure patterns. Unfortunately, current tools for this task have, in the best case, a running time that is only linear in the size of sequence databases. Furthermore, established index data structures for fast sequence matching, like suffix trees or arrays, cannot benefit from the complementarity constraints introduced by the secondary structure of RNAs.

RESULTS

We present a novel method and readily applicable software for time efficient matching of RNA sequence-structure patterns in sequence databases. Our approach is based on affix arrays, a recently introduced index data structure, preprocessed from the target database. Affix arrays support bidirectional pattern search, which is required for efficiently handling the structural constraints of the pattern. Structural patterns like stem-loops can be matched inside out, such that the loop region is matched first and then the pairing bases on the boundaries are matched consecutively. This allows to exploit base pairing information for search space reduction and leads to an expected running time that is sublinear in the size of the sequence database. The incorporation of a new chaining approach in the search of RNA sequence-structure patterns enables the description of molecules folding into complex secondary structures with multiple ordered patterns. The chaining approach removes spurious matches from the set of intermediate results, in particular of patterns with little specificity. In benchmark experiments on the Rfam database, our method runs up to two orders of magnitude faster than previous methods.

CONCLUSIONS

The presented method's sublinear expected running time makes it well suited for RNA sequence-structure pattern matching in large sequence databases. RNA molecules containing several stem-loop substructures can be described by multiple sequence-structure patterns and their matches are efficiently handled by a novel chaining method. Beyond our algorithmic contributions, we provide with Structator a complete and robust open-source software solution for index-based search of RNA sequence-structure patterns. The Structator software is available at http://www.zbh.uni-hamburg.de/Structator.

摘要

背景

RNA 分子的二级结构与其功能密切相关,通常比序列更保守。因此,在数据库中搜索 RNA 时,需要匹配序列-结构模式。不幸的是,目前用于此任务的工具在最佳情况下,其运行时间仅与序列数据库的大小呈线性关系。此外,用于快速序列匹配的现有索引数据结构(如后缀树或数组)不能受益于 RNA 二级结构引入的互补约束。

结果

我们提出了一种新颖的方法和易于应用的软件,用于在序列数据库中高效匹配 RNA 序列-结构模式。我们的方法基于后缀数组,后缀数组是一种最近提出的索引数据结构,从目标数据库中预处理得到。后缀数组支持双向模式搜索,这是有效处理模式结构约束所必需的。像茎环这样的结构模式可以内外匹配,即首先匹配环区,然后连续匹配边界上的配对碱基。这允许利用碱基配对信息来缩小搜索空间,并导致运行时间与序列数据库的大小呈次线性关系。在搜索 RNA 序列-结构模式时采用新的链接方法,使得能够描述具有多个有序模式的复杂二级结构的分子折叠。链接方法从中间结果集中消除了虚假匹配,特别是那些特异性小的模式。在 Rfam 数据库上的基准实验中,我们的方法比以前的方法快两个数量级。

结论

所提出的方法的预期运行时间呈次线性,非常适合在大型序列数据库中进行 RNA 序列-结构模式匹配。包含多个茎环亚结构的 RNA 分子可以通过多个序列-结构模式来描述,并且它们的匹配可以通过新的链接方法有效地处理。除了我们的算法贡献之外,我们还提供了 Structator,这是一个完整的、健壮的开源软件解决方案,用于基于索引的 RNA 序列-结构模式搜索。Structator 软件可在 http://www.zbh.uni-hamburg.de/Structator 上获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f82f/3154205/ce05250036e5/1471-2105-12-214-12.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f82f/3154205/7446c52ad4e8/1471-2105-12-214-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f82f/3154205/2402525fd273/1471-2105-12-214-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f82f/3154205/231e2f7f9109/1471-2105-12-214-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f82f/3154205/cf0a6cb3a2e9/1471-2105-12-214-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f82f/3154205/3d49fea2a8ae/1471-2105-12-214-5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f82f/3154205/7fcd4da7f883/1471-2105-12-214-6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f82f/3154205/5b3eb0a08adb/1471-2105-12-214-7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f82f/3154205/1652c523c81e/1471-2105-12-214-8.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f82f/3154205/a89164aaab40/1471-2105-12-214-9.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f82f/3154205/0c83a94f70fd/1471-2105-12-214-10.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f82f/3154205/66b951cad59c/1471-2105-12-214-11.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f82f/3154205/ce05250036e5/1471-2105-12-214-12.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f82f/3154205/7446c52ad4e8/1471-2105-12-214-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f82f/3154205/2402525fd273/1471-2105-12-214-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f82f/3154205/231e2f7f9109/1471-2105-12-214-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f82f/3154205/cf0a6cb3a2e9/1471-2105-12-214-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f82f/3154205/3d49fea2a8ae/1471-2105-12-214-5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f82f/3154205/7fcd4da7f883/1471-2105-12-214-6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f82f/3154205/5b3eb0a08adb/1471-2105-12-214-7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f82f/3154205/1652c523c81e/1471-2105-12-214-8.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f82f/3154205/a89164aaab40/1471-2105-12-214-9.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f82f/3154205/0c83a94f70fd/1471-2105-12-214-10.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f82f/3154205/66b951cad59c/1471-2105-12-214-11.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f82f/3154205/ce05250036e5/1471-2105-12-214-12.jpg

相似文献

1
Structator: fast index-based search for RNA sequence-structure patterns.Structator:基于快速索引的 RNA 序列-结构模式搜索。
BMC Bioinformatics. 2011 May 27;12:214. doi: 10.1186/1471-2105-12-214.
2
Fast online and index-based algorithms for approximate search of RNA sequence-structure patterns.快速在线和基于索引的算法,用于近似搜索 RNA 序列-结构模式。
BMC Bioinformatics. 2013 Jul 17;14:226. doi: 10.1186/1471-2105-14-226.
3
RNAPattMatch: a web server for RNA sequence/structure motif detection based on pattern matching with flexible gaps.RNAPattMatch:一个基于带灵活间隔的模式匹配来检测RNA序列/结构基序的网络服务器。
Nucleic Acids Res. 2015 Jul 1;43(W1):W507-12. doi: 10.1093/nar/gkv435. Epub 2015 May 4.
4
ExpaRNA-P: simultaneous exact pattern matching and folding of RNAs.ExpaRNA-P:RNA的同步精确模式匹配与折叠
BMC Bioinformatics. 2014 Dec 31;15(1):404. doi: 10.1186/s12859-014-0404-0.
5
Lightweight comparison of RNAs based on exact sequence-structure matches.基于精确序列-结构匹配的 RNA 轻量级比较。
Bioinformatics. 2009 Aug 15;25(16):2095-102. doi: 10.1093/bioinformatics/btp065. Epub 2009 Feb 2.
6
Identification of consensus RNA secondary structures using suffix arrays.使用后缀数组识别共有RNA二级结构。
BMC Bioinformatics. 2006 May 5;7:244. doi: 10.1186/1471-2105-7-244.
7
A method for aligning RNA secondary structures and its application to RNA motif detection.一种用于比对RNA二级结构的方法及其在RNA基序检测中的应用。
BMC Bioinformatics. 2005 Apr 7;6:89. doi: 10.1186/1471-2105-6-89.
8
A structure-based flexible search method for motifs in RNA.一种基于结构的RNA基序灵活搜索方法。
J Comput Biol. 2007 Sep;14(7):908-26. doi: 10.1089/cmb.2007.0061.
9
GUUGle: a utility for fast exact matching under RNA complementary rules including G-U base pairing.GUUGle:一种用于在包括G-U碱基配对的RNA互补规则下进行快速精确匹配的实用工具。
Bioinformatics. 2006 Mar 15;22(6):762-4. doi: 10.1093/bioinformatics/btk041. Epub 2006 Jan 10.
10
StructMiner: a tool for alignment and detection of conserved secondary structure.结构挖掘器:一种用于比对和检测保守二级结构的工具。
Genome Inform. 2004;15(2):102-11.

引用本文的文献

1
Finding and Characterizing Repeats in Plant Genomes.在植物基因组中寻找并鉴定重复序列
Methods Mol Biol. 2022;2443:327-385. doi: 10.1007/978-1-0716-2067-0_18.
2
Alignment-free comparative genomic screen for structured RNAs using coarse-grained secondary structure dot plots.基于粗粒化二级结构点图的无比对基因组学方法筛选结构 RNA。
BMC Genomics. 2017 Dec 2;18(1):935. doi: 10.1186/s12864-017-4309-y.
3
RNA motif search with data-driven element ordering.基于数据驱动的元件排序进行RNA基序搜索。

本文引用的文献

1
Rfam: Wikipedia, clans and the "decimal" release.Rfam:维基百科、家族及“十进制”版本。
Nucleic Acids Res. 2011 Jan;39(Database issue):D141-5. doi: 10.1093/nar/gkq1129. Epub 2010 Nov 9.
2
Fine-tuning structural RNA alignments in the twilight zone.微调 twilight zone 中的结构 RNA 比对。
BMC Bioinformatics. 2010 Apr 30;11:222. doi: 10.1186/1471-2105-11-222.
3
A global view of genomic information--moving beyond the gene and the master regulator.从全球视角看基因组信息——超越基因和主调控因子。
BMC Bioinformatics. 2016 May 18;17(1):216. doi: 10.1186/s12859-016-1074-x.
4
A Machine Learning Approach for Accurate Annotation of Noncoding RNAs.一种用于非编码RNA精确注释的机器学习方法。
IEEE/ACM Trans Comput Biol Bioinform. 2015 May-Jun;12(3):551-9. doi: 10.1109/TCBB.2014.2366758.
5
MONSTER v1.1: a tool to extract and search for RNA non-branching structures.MONSTER v1.1:一种用于提取和搜索RNA非分支结构的工具。
BMC Genomics. 2015;16(Suppl 6):S1. doi: 10.1186/1471-2164-16-S6-S1. Epub 2015 Jun 1.
6
RNAPattMatch: a web server for RNA sequence/structure motif detection based on pattern matching with flexible gaps.RNAPattMatch:一个基于带灵活间隔的模式匹配来检测RNA序列/结构基序的网络服务器。
Nucleic Acids Res. 2015 Jul 1;43(W1):W507-12. doi: 10.1093/nar/gkv435. Epub 2015 May 4.
7
A novel approach to represent and compare RNA secondary structures.一种表示和比较RNA二级结构的新方法。
Nucleic Acids Res. 2014 Jun;42(10):6146-57. doi: 10.1093/nar/gku283. Epub 2014 Apr 21.
8
Fast online and index-based algorithms for approximate search of RNA sequence-structure patterns.快速在线和基于索引的算法,用于近似搜索 RNA 序列-结构模式。
BMC Bioinformatics. 2013 Jul 17;14:226. doi: 10.1186/1471-2105-14-226.
9
Rfam 11.0: 10 years of RNA families.RFAM 11.0:10 年的 RNA 家族。
Nucleic Acids Res. 2013 Jan;41(Database issue):D226-32. doi: 10.1093/nar/gks1005. Epub 2012 Nov 3.
10
Computational prediction of polycomb-associated long non-coding RNAs.长非编码 RNA 与多梳体相关的计算预测。
PLoS One. 2012;7(9):e44878. doi: 10.1371/journal.pone.0044878. Epub 2012 Sep 13.
Trends Genet. 2010 Jan;26(1):21-8. doi: 10.1016/j.tig.2009.11.002. Epub 2009 Nov 26.
4
Significant speedup of database searches with HMMs by search space reduction with PSSM family models.利用 PSSM 家族模型缩小搜索空间,大大提高了 HMM 对数据库的搜索速度。
Bioinformatics. 2009 Dec 15;25(24):3251-8. doi: 10.1093/bioinformatics/btp593. Epub 2009 Oct 14.
5
VARNA: Interactive drawing and editing of the RNA secondary structure.VARNA:RNA 二级结构的交互式绘制和编辑。
Bioinformatics. 2009 Aug 1;25(15):1974-5. doi: 10.1093/bioinformatics/btp250. Epub 2009 Apr 27.
6
Infernal 1.0: inference of RNA alignments.Infernal 1.0:RNA比对推断
Bioinformatics. 2009 May 15;25(10):1335-7. doi: 10.1093/bioinformatics/btp157. Epub 2009 Mar 23.
7
Rfam: updates to the RNA families database.Rfam:RNA家族数据库的更新。
Nucleic Acids Res. 2009 Jan;37(Database issue):D136-40. doi: 10.1093/nar/gkn766. Epub 2008 Oct 25.
8
R-Coffee: a method for multiple alignment of non-coding RNA.R-Coffee:一种非编码RNA多重比对的方法。
Nucleic Acids Res. 2008 May;36(9):e52. doi: 10.1093/nar/gkn174. Epub 2008 Apr 17.
9
RNA consensus structure prediction with RNAalifold.使用RNAalifold进行RNA共有结构预测。
Methods Mol Biol. 2007;395:527-44. doi: 10.1007/978-1-59745-514-5_33.
10
Efficient pairwise RNA structure prediction using probabilistic alignment constraints in Dynalign.在Dynalign中使用概率比对约束进行高效的成对RNA结构预测。
BMC Bioinformatics. 2007 Apr 19;8:130. doi: 10.1186/1471-2105-8-130.