• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于能量的 RNA 二级结构预测算法分析。

Analysis of energy-based algorithms for RNA secondary structure prediction.

机构信息

Computer Science Department, University of British Columbia, Vancouver, BC, Canada.

出版信息

BMC Bioinformatics. 2012 Feb 1;13:22. doi: 10.1186/1471-2105-13-22.

DOI:10.1186/1471-2105-13-22
PMID:22296803
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3347993/
Abstract

BACKGROUND

RNA molecules play critical roles in the cells of organisms, including roles in gene regulation, catalysis, and synthesis of proteins. Since RNA function depends in large part on its folded structures, much effort has been invested in developing accurate methods for prediction of RNA secondary structure from the base sequence. Minimum free energy (MFE) predictions are widely used, based on nearest neighbor thermodynamic parameters of Mathews, Turner et al. or those of Andronescu et al. Some recently proposed alternatives that leverage partition function calculations find the structure with maximum expected accuracy (MEA) or pseudo-expected accuracy (pseudo-MEA) methods. Advances in prediction methods are typically benchmarked using sensitivity, positive predictive value and their harmonic mean, namely F-measure, on datasets of known reference structures. Since such benchmarks document progress in improving accuracy of computational prediction methods, it is important to understand how measures of accuracy vary as a function of the reference datasets and whether advances in algorithms or thermodynamic parameters yield statistically significant improvements. Our work advances such understanding for the MFE and (pseudo-)MEA-based methods, with respect to the latest datasets and energy parameters.

RESULTS

We present three main findings. First, using the bootstrap percentile method, we show that the average F-measure accuracy of the MFE and (pseudo-)MEA-based algorithms, as measured on our largest datasets with over 2000 RNAs from diverse families, is a reliable estimate (within a 2% range with high confidence) of the accuracy of a population of RNA molecules represented by this set. However, average accuracy on smaller classes of RNAs such as a class of 89 Group I introns used previously in benchmarking algorithm accuracy is not reliable enough to draw meaningful conclusions about the relative merits of the MFE and MEA-based algorithms. Second, on our large datasets, the algorithm with best overall accuracy is a pseudo MEA-based algorithm of Hamada et al. that uses a generalized centroid estimator of base pairs. However, between MFE and other MEA-based methods, there is no clear winner in the sense that the relative accuracy of the MFE versus MEA-based algorithms changes depending on the underlying energy parameters. Third, of the four parameter sets we considered, the best accuracy for the MFE-, MEA-based, and pseudo-MEA-based methods is 0.686, 0.680, and 0.711, respectively (on a scale from 0 to 1 with 1 meaning perfect structure predictions) and is obtained with a thermodynamic parameter set obtained by Andronescu et al. called BL* (named after the Boltzmann likelihood method by which the parameters were derived).

CONCLUSIONS

Large datasets should be used to obtain reliable measures of the accuracy of RNA structure prediction algorithms, and average accuracies on specific classes (such as Group I introns and Transfer RNAs) should be interpreted with caution, considering the relatively small size of currently available datasets for such classes. The accuracy of the MEA-based methods is significantly higher when using the BL* parameter set of Andronescu et al. than when using the parameters of Mathews and Turner, and there is no significant difference between the accuracy of MEA-based methods and MFE when using the BL* parameters. The pseudo-MEA-based method of Hamada et al. with the BL* parameter set significantly outperforms all other MFE and MEA-based algorithms on our large data sets.

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fe1d/3347993/d7ee0e1bcf57/1471-2105-13-22-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fe1d/3347993/b1c104f501f2/1471-2105-13-22-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fe1d/3347993/c4227b20e107/1471-2105-13-22-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fe1d/3347993/0b552a1086b0/1471-2105-13-22-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fe1d/3347993/d7ee0e1bcf57/1471-2105-13-22-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fe1d/3347993/b1c104f501f2/1471-2105-13-22-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fe1d/3347993/c4227b20e107/1471-2105-13-22-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fe1d/3347993/0b552a1086b0/1471-2105-13-22-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fe1d/3347993/d7ee0e1bcf57/1471-2105-13-22-4.jpg
摘要

背景

RNA 分子在生物细胞中发挥着关键作用,包括在基因调控、催化和蛋白质合成中的作用。由于 RNA 的功能在很大程度上取决于其折叠结构,因此人们投入了大量精力来开发从碱基序列预测 RNA 二级结构的准确方法。最小自由能 (MFE) 预测是基于 Mathews、Turner 等人的最近邻热力学参数或 Andronescu 等人的参数广泛使用的。一些最近提出的利用分区函数计算的替代方法找到了具有最大预期准确性 (MEA) 或伪预期准确性 (pseudo-MEA) 的结构。预测方法的进展通常使用在具有已知参考结构的数据集上的灵敏度、阳性预测值及其调和平均值(即 F 度量)进行基准测试。由于此类基准测试记录了计算预测方法准确性提高的进展,因此了解准确性度量如何随参考数据集变化以及算法或热力学参数的改进是否产生统计学上显著的改进非常重要。我们的工作针对最新的数据集和能量参数,针对 MFE 和(伪)MEA 方法,推进了这种理解。

结果

我们提出了三个主要发现。首先,使用自举百分位法,我们表明,基于 MFE 和(伪)MEA 的算法的平均 F 度量准确性,在我们最大的数据集上进行测量,该数据集包含来自多个家族的 2000 多个 RNA,是该数据集所代表的 RNA 分子群体的可靠估计(在置信度高的情况下,范围在 2% 以内)。然而,在较小的 RNA 类(如以前用于基准算法准确性的 89 个 I 类内含子类)上的平均准确性不够可靠,无法对 MFE 和基于 MEA 的算法的相对优点得出有意义的结论。其次,在我们的大型数据集上,整体准确性最好的算法是 Hamada 等人的基于伪 MEA 的算法,该算法使用碱基对的广义质心估计器。然而,在 MFE 和其他基于 MEA 的方法之间,没有明显的赢家,因为 MFE 与基于 MEA 的算法的相对准确性取决于基础能量参数。第三,在我们考虑的四个参数集中,MFE、MEA 基于和基于 pseudo-MEA 的方法的最佳准确性分别为 0.686、0.680 和 0.711(在 0 到 1 的范围内,1 表示完美的结构预测),并且使用由 Andronescu 等人提出的称为 BL*的热力学参数集(以推导参数的玻尔兹曼似然法命名)获得。

结论

应该使用大型数据集来获得 RNA 结构预测算法准确性的可靠度量,并且应该谨慎解释特定类(如 I 类内含子和转移 RNA)的平均准确性,因为目前此类类别的数据集相对较小。当使用 Andronescu 等人的 BL参数集时,基于 MEA 的方法的准确性明显高于使用 Mathews 和 Turner 的参数时,并且当使用 BL参数时,基于 MEA 的方法和 MFE 的准确性之间没有显著差异。Hamada 等人的基于伪 MEA 的方法使用 BL*参数集在我们的大型数据集上明显优于所有其他基于 MFE 和基于 MEA 的算法。

相似文献

1
Analysis of energy-based algorithms for RNA secondary structure prediction.基于能量的 RNA 二级结构预测算法分析。
BMC Bioinformatics. 2012 Feb 1;13:22. doi: 10.1186/1471-2105-13-22.
2
TurboFold: iterative probabilistic estimation of secondary structures for multiple RNA sequences.TurboFold:用于多个 RNA 序列的二级结构的迭代概率估计。
BMC Bioinformatics. 2011 Apr 20;12:108. doi: 10.1186/1471-2105-12-108.
3
Energy-directed RNA structure prediction.能量导向的RNA结构预测。
Methods Mol Biol. 2014;1097:71-84. doi: 10.1007/978-1-62703-709-9_4.
4
Evaluation of a sophisticated SCFG design for RNA secondary structure prediction.用于RNA二级结构预测的复杂SCFG设计评估。
Theory Biosci. 2011 Dec;130(4):313-36. doi: 10.1007/s12064-011-0139-7. Epub 2011 Dec 2.
5
Prediction of RNA secondary structure by maximizing pseudo-expected accuracy.通过最大化伪预期精度预测 RNA 二级结构。
BMC Bioinformatics. 2010 Nov 30;11:586. doi: 10.1186/1471-2105-11-586.
6
Maximum expected accuracy structural neighbors of an RNA secondary structure.RNA 二级结构的最大预期精度结构邻居。
BMC Bioinformatics. 2012 Apr 12;13 Suppl 5(Suppl 5):S6. doi: 10.1186/1471-2105-13-S5-S6.
7
Evaluating the accuracy of SHAPE-directed RNA secondary structure predictions.评估 SHAPE 指导的 RNA 二级结构预测的准确性。
Nucleic Acids Res. 2013 Mar 1;41(5):2807-16. doi: 10.1093/nar/gks1283. Epub 2013 Jan 15.
8
Linear-Time Algorithms for RNA Structure Prediction.用于RNA结构预测的线性时间算法
Methods Mol Biol. 2023;2586:15-34. doi: 10.1007/978-1-0716-2768-6_2.
9
Conditioning and Robustness of RNA Boltzmann Sampling under Thermodynamic Parameter Perturbations.热力学参数扰动下RNA玻尔兹曼采样的条件设定与稳健性
Biophys J. 2017 Jul 25;113(2):321-329. doi: 10.1016/j.bpj.2017.05.026. Epub 2017 Jun 16.
10
Evaluation of the suitability of free-energy minimization using nearest-neighbor energy parameters for RNA secondary structure prediction.使用最近邻能量参数的自由能最小化对RNA二级结构预测适用性的评估。
BMC Bioinformatics. 2004 Aug 5;5:105. doi: 10.1186/1471-2105-5-105.

引用本文的文献

1
Led-Seq: ligation-enhanced double-end sequence-based structure analysis of RNA.Led-Seq:基于连接增强的双端测序的 RNA 结构分析。
Nucleic Acids Res. 2023 Jun 23;51(11):e63. doi: 10.1093/nar/gkad312.
2
KnotAli: informed energy minimization through the use of evolutionary information. KnotAli:通过利用进化信息实现信息最小化。
BMC Bioinformatics. 2022 May 3;23(1):159. doi: 10.1186/s12859-022-04673-3.
3
A Comprehensive Computational Investigation into the Conserved Virulent Proteins of species Unveils Potential Small-Interfering RNA Candidates as a New Therapeutic Strategy against Shigellosis.

本文引用的文献

1
Ensemble-based prediction of RNA secondary structures.基于集成的 RNA 二级结构预测。
BMC Bioinformatics. 2013 Apr 24;14:139. doi: 10.1186/1471-2105-14-139.
2
Rich parameterization improves RNA structure prediction.丰富的参数化方法可改善RNA结构预测。
J Comput Biol. 2011 Nov;18(11):1525-42. doi: 10.1089/cmb.2011.0184. Epub 2011 Oct 28.
3
Prediction of RNA secondary structure by maximizing pseudo-expected accuracy.通过最大化伪预期精度预测 RNA 二级结构。
全面的计算研究揭示了 种保守的毒力蛋白,为志贺氏菌病的治疗提供了新的潜在小干扰 RNA 候选药物。
Molecules. 2022 Mar 17;27(6):1936. doi: 10.3390/molecules27061936.
4
Secondary Structure of Influenza A Virus Genomic Segment 8 RNA Folded in a Cellular Environment.在细胞环境中折叠的甲型流感病毒基因组 8 段 RNA 的二级结构。
Int J Mol Sci. 2022 Feb 23;23(5):2452. doi: 10.3390/ijms23052452.
5
Entanglements of structure elements revealed in RNA 3D models.RNA 三维结构模型中揭示的结构元素纠缠。
Nucleic Acids Res. 2021 Sep 27;49(17):9625-9632. doi: 10.1093/nar/gkab716.
6
Structural landscape of the complete genomes of dengue virus serotypes and other viral hemorrhagic fevers.登革热病毒血清型和其他病毒性出血热的完整基因组的结构景观。
BMC Genomics. 2021 May 17;22(1):352. doi: 10.1186/s12864-021-07638-7.
7
Computational prediction of potential siRNA and human miRNA sequences to silence orf1ab associated genes for future therapeutics against SARS-CoV-2.预测潜在的siRNA和人类miRNA序列以沉默与orf1ab相关的基因,用于未来抗SARS-CoV-2的治疗。
Inform Med Unlocked. 2021;24:100569. doi: 10.1016/j.imu.2021.100569. Epub 2021 Apr 8.
8
How to benchmark RNA secondary structure prediction accuracy.如何评估 RNA 二级结构预测准确性。
Methods. 2019 Jun 1;162-163:60-67. doi: 10.1016/j.ymeth.2019.04.003. Epub 2019 Apr 2.
9
A Method for RNA Structure Prediction Shows Evidence for Structure in lncRNAs.一种RNA结构预测方法显示lncRNA中存在结构证据。
Front Mol Biosci. 2018 Dec 3;5:111. doi: 10.3389/fmolb.2018.00111. eCollection 2018.
10
RNA secondary structure prediction with pseudoknots: Contribution of algorithm versus energy model.具有假结的 RNA 二级结构预测:算法与能量模型的贡献。
PLoS One. 2018 Apr 5;13(4):e0194583. doi: 10.1371/journal.pone.0194583. eCollection 2018.
BMC Bioinformatics. 2010 Nov 30;11:586. doi: 10.1186/1471-2105-11-586.
4
Computational approaches for RNA energy parameter estimation.计算方法在 RNA 能量参数估计中的应用。
RNA. 2010 Dec;16(12):2304-18. doi: 10.1261/rna.1950510. Epub 2010 Oct 12.
5
Improved RNA secondary structure prediction by maximizing expected pair accuracy.通过最大化期望配对准确率改进RNA二级结构预测。
RNA. 2009 Oct;15(10):1805-13. doi: 10.1261/rna.1643609. Epub 2009 Aug 24.
6
Prediction of RNA secondary structure using generalized centroid estimators.使用广义质心估计器预测RNA二级结构。
Bioinformatics. 2009 Feb 15;25(4):465-73. doi: 10.1093/bioinformatics/btn601. Epub 2008 Dec 18.
7
RNA STRAND: the RNA secondary structure and statistical analysis database.RNA链:RNA二级结构与统计分析数据库。
BMC Bioinformatics. 2008 Aug 13;9:340. doi: 10.1186/1471-2105-9-340.
8
CONTRAfold: RNA secondary structure prediction without physics-based models.CONTRAfold:无需基于物理模型的RNA二级结构预测
Bioinformatics. 2006 Jul 15;22(14):e90-8. doi: 10.1093/bioinformatics/btl246.
9
Pfold: RNA secondary structure prediction using stochastic context-free grammars.Pfold:使用随机上下文无关文法进行RNA二级结构预测。
Nucleic Acids Res. 2003 Jul 1;31(13):3423-8. doi: 10.1093/nar/gkg614.
10
The brave new world of RNA.RNA的全新世界。
Nature. 2002 Jul 11;418(6894):122-4. doi: 10.1038/418122a.