• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

评估集合分布紊乱对基于上下文无关文法的 RNA 二级结构统计抽样的影响。

Evaluating the effect of disturbed ensemble distributions on SCFG based statistical sampling of RNA secondary structures.

机构信息

Department of Computer Science, University of Kaiserslautern, D-67653 Kaiserslautern, P.O. Box 3049, Germany.

出版信息

BMC Bioinformatics. 2012 Jul 9;13:159. doi: 10.1186/1471-2105-13-159.

DOI:10.1186/1471-2105-13-159
PMID:22776037
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3871765/
Abstract

BACKGROUND

Over the past years, statistical and Bayesian approaches have become increasingly appreciated to address the long-standing problem of computational RNA structure prediction. Recently, a novel probabilistic method for the prediction of RNA secondary structures from a single sequence has been studied which is based on generating statistically representative and reproducible samples of the entire ensemble of feasible structures for a particular input sequence. This method samples the possible foldings from a distribution implied by a sophisticated (traditional or length-dependent) stochastic context-free grammar (SCFG) that mirrors the standard thermodynamic model applied in modern physics-based prediction algorithms. Specifically, that grammar represents an exact probabilistic counterpart to the energy model underlying the Sfold software, which employs a sampling extension of the partition function (PF) approach to produce statistically representative subsets of the Boltzmann-weighted ensemble. Although both sampling approaches have the same worst-case time and space complexities, it has been indicated that they differ in performance (both with respect to prediction accuracy and quality of generated samples), where neither of these two competing approaches generally outperforms the other.

RESULTS

In this work, we will consider the SCFG based approach in order to perform an analysis on how the quality of generated sample sets and the corresponding prediction accuracy changes when different degrees of disturbances are incorporated into the needed sampling probabilities. This is motivated by the fact that if the results prove to be resistant to large errors on the distinct sampling probabilities (compared to the exact ones), then it will be an indication that these probabilities do not need to be computed exactly, but it may be sufficient and more efficient to approximate them. Thus, it might then be possible to decrease the worst-case time requirements of such an SCFG based sampling method without significant accuracy losses. If, on the other hand, the quality of sampled structures can be observed to strongly react to slight disturbances, there is little hope for improving the complexity by heuristic procedures. We hence provide a reliable test for the hypothesis that a heuristic method could be implemented to improve the time scaling of RNA secondary structure prediction in the worst-case - without sacrificing much of the accuracy of the results.

CONCLUSIONS

Our experiments indicate that absolute errors generally lead to the generation of useless sample sets, whereas relative errors seem to have only small negative impact on both the predictive accuracy and the overall quality of resulting structure samples. Based on these observations, we present some useful ideas for developing a time-reduced sampling method guaranteeing an acceptable predictive accuracy. We also discuss some inherent drawbacks that arise in the context of approximation. The key results of this paper are crucial for the design of an efficient and competitive heuristic prediction method based on the increasingly accepted and attractive statistical sampling approach. This has indeed been indicated by the construction of prototype algorithms.

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7bd9/3871765/c7907045aab3/1471-2105-13-159-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7bd9/3871765/614724829f46/1471-2105-13-159-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7bd9/3871765/39f43ec5fa83/1471-2105-13-159-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7bd9/3871765/a9c0229ef5c8/1471-2105-13-159-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7bd9/3871765/c7907045aab3/1471-2105-13-159-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7bd9/3871765/614724829f46/1471-2105-13-159-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7bd9/3871765/39f43ec5fa83/1471-2105-13-159-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7bd9/3871765/a9c0229ef5c8/1471-2105-13-159-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7bd9/3871765/c7907045aab3/1471-2105-13-159-4.jpg
摘要

背景

在过去的几年中,统计和贝叶斯方法已经越来越多地被用于解决长期存在的计算 RNA 结构预测问题。最近,研究了一种从单个序列预测 RNA 二级结构的新的概率方法,该方法基于为特定输入序列的整个可行结构集合生成具有统计代表性和可重现性的样本。该方法从一个复杂的(传统或长度依赖的)随机上下文无关语法(SCFG)所暗示的分布中对可能的折叠进行采样,该语法反映了现代基于物理的预测算法中应用的标准热力学模型。具体来说,该语法表示 Sfold 软件所基于的能量模型的精确概率对应物,该软件采用分配函数(PF)方法的扩展来生成统计上有代表性的 Boltzmann 加权集合的子集。尽管这两种采样方法的最坏情况时间和空间复杂度相同,但已经表明它们在性能上有所不同(在预测准确性和生成样本的质量方面),这两种竞争方法都没有一种普遍优于另一种。

结果

在这项工作中,我们将考虑基于 SCFG 的方法,以分析当将不同程度的干扰纳入所需的采样概率时,生成的样本集的质量和相应的预测准确性如何变化。这是因为如果结果证明对不同的采样概率(与精确概率相比)具有较大的误差具有抗性,那么这表明这些概率不需要精确计算,而是可以近似计算。因此,在不显着降低准确性的情况下,可能可以降低基于此类 SCFG 的采样方法的最坏情况时间要求。另一方面,如果可以观察到采样结构的质量对轻微干扰强烈反应,则通过启发式程序来改善复杂性的希望很小。因此,我们提供了一个可靠的测试,以验证以下假设:可以实施启发式方法来改善最坏情况下的 RNA 二级结构预测的时间缩放,而不会牺牲结果的准确性。

结论

我们的实验表明,绝对误差通常会导致生成无用的样本集,而相对误差似乎对预测准确性和生成结构样本的整体质量都只有很小的负面影响。基于这些观察结果,我们提出了一些有用的想法,以开发一种可保证可接受的预测准确性的时间减少的采样方法。我们还讨论了在近似情况下出现的一些固有缺点。本文的关键结果对于设计基于越来越受欢迎和有吸引力的统计采样方法的高效竞争启发式预测方法至关重要。这确实是通过构建原型算法来指示的。

相似文献

1
Evaluating the effect of disturbed ensemble distributions on SCFG based statistical sampling of RNA secondary structures.评估集合分布紊乱对基于上下文无关文法的 RNA 二级结构统计抽样的影响。
BMC Bioinformatics. 2012 Jul 9;13:159. doi: 10.1186/1471-2105-13-159.
2
Evaluation of a sophisticated SCFG design for RNA secondary structure prediction.用于RNA二级结构预测的复杂SCFG设计评估。
Theory Biosci. 2011 Dec;130(4):313-36. doi: 10.1007/s12064-011-0139-7. Epub 2011 Dec 2.
3
A statistical sampling algorithm for RNA secondary structure prediction.一种用于RNA二级结构预测的统计抽样算法。
Nucleic Acids Res. 2003 Dec 15;31(24):7280-301. doi: 10.1093/nar/gkg938.
4
Statistical and Bayesian approaches to RNA secondary structure prediction.用于RNA二级结构预测的统计方法和贝叶斯方法。
RNA. 2006 Mar;12(3):323-31. doi: 10.1261/rna.2274106.
5
CONTRAfold: RNA secondary structure prediction without physics-based models.CONTRAfold:无需基于物理模型的RNA二级结构预测
Bioinformatics. 2006 Jul 15;22(14):e90-8. doi: 10.1093/bioinformatics/btl246.
6
A range of complex probabilistic models for RNA secondary structure prediction that includes the nearest-neighbor model and more.一系列复杂的 RNA 二级结构预测概率模型,包括最近邻模型等。
RNA. 2012 Feb;18(2):193-212. doi: 10.1261/rna.030049.111. Epub 2011 Dec 22.
7
TurboFold: iterative probabilistic estimation of secondary structures for multiple RNA sequences.TurboFold:用于多个 RNA 序列的二级结构的迭代概率估计。
BMC Bioinformatics. 2011 Apr 20;12:108. doi: 10.1186/1471-2105-12-108.
8
Evaluation of several lightweight stochastic context-free grammars for RNA secondary structure prediction.用于RNA二级结构预测的几种轻量级随机上下文无关文法的评估
BMC Bioinformatics. 2004 Jun 4;5:71. doi: 10.1186/1471-2105-5-71.
9
Analysis of energy-based algorithms for RNA secondary structure prediction.基于能量的 RNA 二级结构预测算法分析。
BMC Bioinformatics. 2012 Feb 1;13:22. doi: 10.1186/1471-2105-13-22.
10
SCFGs in RNA secondary structure prediction RNA secondary structure prediction: a hands-on approach.RNA二级结构预测中的SCFGs:RNA二级结构预测:一种实践方法。
Methods Mol Biol. 2014;1097:143-62. doi: 10.1007/978-1-62703-709-9_8.

引用本文的文献

1
Asymptotic distribution of motifs in a stochastic context-free grammar model of RNA folding.RNA折叠随机上下文无关语法模型中基序的渐近分布
J Math Biol. 2014 Dec;69(6-7):1743-72. doi: 10.1007/s00285-013-0750-y. Epub 2014 Jan 3.

本文引用的文献

1
Evaluation of a sophisticated SCFG design for RNA secondary structure prediction.用于RNA二级结构预测的复杂SCFG设计评估。
Theory Biosci. 2011 Dec;130(4):313-36. doi: 10.1007/s12064-011-0139-7. Epub 2011 Dec 2.
2
Rich parameterization improves RNA structure prediction.丰富的参数化方法可改善RNA结构预测。
J Comput Biol. 2011 Nov;18(11):1525-42. doi: 10.1089/cmb.2011.0184. Epub 2011 Oct 28.
3
Analysis of the free energy in a stochastic RNA secondary structure model.随机 RNA 二级结构模型中的自由能分析。
IEEE/ACM Trans Comput Biol Bioinform. 2011 Nov-Dec;8(6):1468-82. doi: 10.1109/TCBB.2010.126.
4
Computational approaches for RNA energy parameter estimation.计算方法在 RNA 能量参数估计中的应用。
RNA. 2010 Dec;16(12):2304-18. doi: 10.1261/rna.1950510. Epub 2010 Oct 12.
5
On quantitative effects of RNA shape abstraction.关于RNA形状抽象化的定量效应。
Theory Biosci. 2009 Nov;128(4):211-25. doi: 10.1007/s12064-009-0074-z. Epub 2009 Sep 15.
6
Improved RNA secondary structure prediction by maximizing expected pair accuracy.通过最大化期望配对准确率改进RNA二级结构预测。
RNA. 2009 Oct;15(10):1805-13. doi: 10.1261/rna.1643609. Epub 2009 Aug 24.
7
Prediction of RNA secondary structure using generalized centroid estimators.使用广义质心估计器预测RNA二级结构。
Bioinformatics. 2009 Feb 15;25(4):465-73. doi: 10.1093/bioinformatics/btn601. Epub 2008 Dec 18.
8
Shape based indexing for faster search of RNA family databases.基于形状的索引,用于更快地搜索RNA家族数据库。
BMC Bioinformatics. 2008 Feb 29;9:131. doi: 10.1186/1471-2105-9-131.
9
Efficient parameter estimation for RNA secondary structure prediction.用于RNA二级结构预测的高效参数估计
Bioinformatics. 2007 Jul 1;23(13):i19-28. doi: 10.1093/bioinformatics/btm223.
10
Query-dependent banding (QDB) for faster RNA similarity searches.用于更快RNA相似性搜索的查询依赖条带法(QDB)。
PLoS Comput Biol. 2007 Mar 30;3(3):e56. doi: 10.1371/journal.pcbi.0030056. Epub 2007 Feb 7.