• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

LocARNAscan:在基于序列和结构的RNA同源性搜索中纳入热力学稳定性

LocARNAscan: Incorporating thermodynamic stability in sequence and structure-based RNA homology search.

作者信息

Will Sebastian, Siebauer Michael F, Heyne Steffen, Engelhardt Jan, Stadler Peter F, Backofen Rolf

机构信息

Bioinformatics Group, Department of Computer Science, and Interdisciplinary Center for Bioinformatics, University of Leipzig, Härtelstraße 16 -18, Leipzig D-04107, Germany.

Bioinformatics Group, Department of Computer Science, Albert-Ludwigs-Universität Freiburg, Georges-Köhler-Allee 106, Freiburg D-79110, Germany.

出版信息

Algorithms Mol Biol. 2013 Apr 20;8:14. doi: 10.1186/1748-7188-8-14. eCollection 2013.

DOI:10.1186/1748-7188-8-14
PMID:23601347
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3716875/
Abstract

BACKGROUND

The search for distant homologs has become an import issue in genome annotation. A particular difficulty is posed by divergent homologs that have lost recognizable sequence similarity. This same problem also arises in the recognition of novel members of large classes of RNAs such as snoRNAs or microRNAs that consist of families unrelated by common descent. Current homology search tools for structured RNAs are either based entirely on sequence similarity (such as blast or hmmer) or combine sequence and secondary structure. The most prominent example of the latter class of tools is Infernal. Alternatives are descriptor-based methods. In most practical applications published to-date, however, the information contained in covariance models or manually prescribed search patterns is dominated by sequence information. Here we ask two related questions: (1) Is secondary structure alone informative for homology search and the detection of novel members of RNA classes? (2) To what extent is the thermodynamic propensity of the target sequence to fold into the correct secondary structure helpful for this task?

RESULTS

Sequence-structure alignment can be used as an alternative search strategy. In this scenario, the query consists of a base pairing probability matrix, which can be derived either from a single sequence or from a multiple alignment representing a set of known representatives. Sequence information can be optionally added to the query. The target sequence is pre-processed to obtain local base pairing probabilities. As a search engine we devised a semi-global scanning variant of LocARNA's algorithm for sequence-structure alignment. The LocARNAscan tool is optimized for speed and low memory consumption. In benchmarking experiments on artificial data we observe that the inclusion of thermodynamic stability is helpful, albeit only in a regime of extremely low sequence information in the query. We observe, furthermore, that the sensitivity is bounded in particular by the limited accuracy of the predicted local structures of the target sequence.

CONCLUSIONS

Although we demonstrate that a purely structure-based homology search is feasible in principle, it is unlikely to outperform tools such as Infernal in most application scenarios, where a substantial amount of sequence information is typically available. The LocARNAscan approach will profit, however, from high throughput methods to determine RNA secondary structure. In transcriptome-wide applications, such methods will provide accurate structure annotations on the target side.

AVAILABILITY

Source code of the free software LocARNAscan 1.0 and supplementary data are available at http://www.bioinf.uni-leipzig.de/Software/LocARNAscan.

摘要

背景

寻找远源同源物已成为基因组注释中的一个重要问题。具有发散性的同源物失去了可识别的序列相似性,这带来了一个特殊的困难。在识别诸如snoRNA或microRNA等由非共同起源的家族组成的大类RNA的新成员时,同样的问题也会出现。当前用于结构化RNA的同源性搜索工具要么完全基于序列相似性(如blast或hmmer),要么结合序列和二级结构。后一类工具中最突出的例子是Infernal。还有基于描述符的方法。然而,在迄今为止发表的大多数实际应用中,协方差模型或手动规定的搜索模式中包含的信息主要由序列信息主导。在这里,我们提出两个相关问题:(1)仅二级结构对于同源性搜索和RNA类新成员的检测是否具有信息性?(2)目标序列折叠成正确二级结构的热力学倾向在多大程度上有助于这项任务?

结果

序列-结构比对可以用作一种替代搜索策略。在这种情况下,查询由一个碱基配对概率矩阵组成,该矩阵可以从单个序列或从代表一组已知代表的多序列比对中推导出来。序列信息可以选择性地添加到查询中。对目标序列进行预处理以获得局部碱基配对概率。作为搜索引擎,我们设计了LocARNA算法的一种半全局扫描变体用于序列-结构比对。LocARNAscan工具针对速度和低内存消耗进行了优化。在人工数据的基准实验中,我们观察到纳入热力学稳定性是有帮助的,尽管仅在查询中序列信息极低的情况下。此外,我们观察到灵敏度尤其受到目标序列预测局部结构有限准确性的限制。

结论

虽然我们证明了纯基于结构的同源性搜索原则上是可行的,但在大多数通常有大量序列信息可用的应用场景中,它不太可能优于Infernal等工具。然而,LocARNAscan方法将受益于高通量方法来确定RNA二级结构。在全转录组应用中,此类方法将在目标方面提供准确的结构注释。

可用性

免费软件LocARNAscan 1.0的源代码和补充数据可在http://www.bioinf.uni-leipzig.de/Software/LocARNAscan获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/17f0/3716875/74e9bb9b9057/1748-7188-8-14-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/17f0/3716875/d8b28a625a1e/1748-7188-8-14-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/17f0/3716875/34eaf454b668/1748-7188-8-14-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/17f0/3716875/74e9bb9b9057/1748-7188-8-14-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/17f0/3716875/d8b28a625a1e/1748-7188-8-14-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/17f0/3716875/34eaf454b668/1748-7188-8-14-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/17f0/3716875/74e9bb9b9057/1748-7188-8-14-3.jpg

相似文献

1
LocARNAscan: Incorporating thermodynamic stability in sequence and structure-based RNA homology search.LocARNAscan:在基于序列和结构的RNA同源性搜索中纳入热力学稳定性
Algorithms Mol Biol. 2013 Apr 20;8:14. doi: 10.1186/1748-7188-8-14. eCollection 2013.
2
Iterative sequence/secondary structure search for protein homologs: comparison with amino acid sequence alignments and application to fold recognition in genome databases.用于蛋白质同源物的迭代序列/二级结构搜索:与氨基酸序列比对的比较及在基因组数据库中折叠识别的应用
Bioinformatics. 2000 Nov;16(11):988-1002. doi: 10.1093/bioinformatics/16.11.988.
3
ProbeAlign: incorporating high-throughput sequencing-based structure probing information into ncRNA homology search.ProbeAlign:将高通量测序结构探测信息纳入 ncRNA 同源搜索。
BMC Bioinformatics. 2014;15 Suppl 9(Suppl 9):S15. doi: 10.1186/1471-2105-15-S9-S15. Epub 2014 Sep 10.
4
Fast online and index-based algorithms for approximate search of RNA sequence-structure patterns.快速在线和基于索引的算法,用于近似搜索 RNA 序列-结构模式。
BMC Bioinformatics. 2013 Jul 17;14:226. doi: 10.1186/1471-2105-14-226.
5
SnoReport: computational identification of snoRNAs with unknown targets.SnoReport:未知靶标的小分子核仁RNA的计算识别
Bioinformatics. 2008 Jan 15;24(2):158-64. doi: 10.1093/bioinformatics/btm464. Epub 2007 Sep 25.
6
TurboFold: iterative probabilistic estimation of secondary structures for multiple RNA sequences.TurboFold:用于多个 RNA 序列的二级结构的迭代概率估计。
BMC Bioinformatics. 2011 Apr 20;12:108. doi: 10.1186/1471-2105-12-108.
7
ExpaRNA-P: simultaneous exact pattern matching and folding of RNAs.ExpaRNA-P:RNA的同步精确模式匹配与折叠
BMC Bioinformatics. 2014 Dec 31;15(1):404. doi: 10.1186/s12859-014-0404-0.
8
Searching for evolutionary distant RNA homologs within genomic sequences using partition function posterior probabilities.利用配分函数后验概率在基因组序列中搜索进化距离较远的RNA同源物。
BMC Bioinformatics. 2008 Jan 28;9:61. doi: 10.1186/1471-2105-9-61.
9
Customized strategies for discovering distant ncRNA homologs.发现远距离非编码RNA同源物的定制策略。
Brief Funct Genomic Proteomic. 2009 Nov;8(6):451-60. doi: 10.1093/bfgp/elp035. Epub 2009 Sep 24.
10
RNAmountAlign: Efficient software for local, global, semiglobal pairwise and multiple RNA sequence/structure alignment.RNAmountAlign:用于局部、全局、半全局两两和多 RNA 序列/结构比对的高效软件。
PLoS One. 2020 Jan 24;15(1):e0227177. doi: 10.1371/journal.pone.0227177. eCollection 2020.

引用本文的文献

1
A database of flavivirus RNA structures with a search algorithm for pseudoknots and triple base interactions.一个包含黄病毒 RNA 结构的数据库,带有搜索假结和三碱基相互作用的算法。
Bioinformatics. 2021 May 17;37(7):956-962. doi: 10.1093/bioinformatics/btaa759.
2
GraphClust2: Annotation and discovery of structured RNAs with scalable and accessible integrative clustering.GraphClust2:具有可扩展和可访问的集成聚类功能的结构化 RNA 的注释和发现。
Gigascience. 2019 Dec 1;8(12). doi: 10.1093/gigascience/giz150.
3
PATTERNA: transcriptome-wide search for functional RNA elements via structural data signatures.

本文引用的文献

1
Structure-based whole-genome realignment reveals many novel noncoding RNAs.基于结构的全基因组重排揭示了许多新的非编码 RNA。
Genome Res. 2013 Jun;23(6):1018-27. doi: 10.1101/gr.137091.111. Epub 2013 Jan 7.
2
Rfam 11.0: 10 years of RNA families.RFAM 11.0:10 年的 RNA 家族。
Nucleic Acids Res. 2013 Jan;41(Database issue):D226-32. doi: 10.1093/nar/gks1005. Epub 2012 Nov 3.
3
LocARNA-P: accurate boundary prediction and improved detection of structural RNAs.LocARNA-P:准确的边界预测和结构 RNA 的改进检测。
通过结构数据特征进行全转录组范围内的功能性 RNA 元件搜索。
Genome Biol. 2018 Mar 1;19(1):28. doi: 10.1186/s13059-018-1399-z.
4
RNAscClust: clustering RNA sequences using structure conservation and graph based motifs.RNAscClust:使用结构保守性和基于图的基元对 RNA 序列进行聚类。
Bioinformatics. 2017 Jul 15;33(14):2089-2096. doi: 10.1093/bioinformatics/btx114.
5
Revisiting the structure/function relationships of H/ACA(-like) RNAs: a unified model for Euryarchaea and Crenarchaea.重新审视H/ACA(类)RNA的结构/功能关系:古菌广域界和泉古菌界的统一模型
Nucleic Acids Res. 2015 Sep 18;43(16):7744-61. doi: 10.1093/nar/gkv756. Epub 2015 Aug 3.
6
Computational analysis of conserved RNA secondary structure in transcriptomes and genomes.转录组和基因组中保守 RNA 二级结构的计算分析。
Annu Rev Biophys. 2014;43:433-56. doi: 10.1146/annurev-biophys-051013-022950.
7
Bioinformatics of prokaryotic RNAs.原核生物RNA的生物信息学
RNA Biol. 2014;11(5):470-83. doi: 10.4161/rna.28647. Epub 2014 Apr 2.
8
Fast online and index-based algorithms for approximate search of RNA sequence-structure patterns.快速在线和基于索引的算法,用于近似搜索 RNA 序列-结构模式。
BMC Bioinformatics. 2013 Jul 17;14:226. doi: 10.1186/1471-2105-14-226.
RNA. 2012 May;18(5):900-14. doi: 10.1261/rna.029041.111. Epub 2012 Mar 26.
4
Global or local? Predicting secondary structure and accessibility in mRNAs.全局还是局部?预测 mRNA 的二级结构和可及性。
Nucleic Acids Res. 2012 Jul;40(12):5215-26. doi: 10.1093/nar/gks181. Epub 2012 Feb 28.
5
What fraction of the human genome is functional?人类基因组中有多少部分是有功能的?
Genome Res. 2011 Nov;21(11):1769-76. doi: 10.1101/gr.116814.110. Epub 2011 Aug 29.
6
Understanding the transcriptome through RNA structure.通过 RNA 结构理解转录组。
Nat Rev Genet. 2011 Aug 18;12(9):641-55. doi: 10.1038/nrg3049.
7
BlastR--fast and accurate database searches for non-coding RNAs.BlastR--快速准确的非编码 RNA 数据库搜索。
Nucleic Acids Res. 2011 Sep 1;39(16):6886-95. doi: 10.1093/nar/gkr335. Epub 2011 May 30.
8
Selective constraints in conserved folded RNAs of drosophilid and hominid genomes.果蝇和人类基因组中保守折叠 RNA 的选择约束。
Mol Biol Evol. 2011 Apr;28(4):1519-29. doi: 10.1093/molbev/msq343. Epub 2010 Dec 20.
9
Rfam: Wikipedia, clans and the "decimal" release.Rfam:维基百科、家族及“十进制”版本。
Nucleic Acids Res. 2011 Jan;39(Database issue):D141-5. doi: 10.1093/nar/gkq1129. Epub 2010 Nov 9.
10
The tedious task of finding homologous noncoding RNA genes.寻找同源非编码RNA基因这项冗长乏味的任务。
RNA. 2009 Dec;15(12):2075-82. doi: 10.1261/rna.1556009. Epub 2009 Oct 27.