基于实验数据的 RNA 2D 结构预测的机器学习基准测试

Machine learning for RNA 2D structure prediction benchmarked on experimental data.

机构信息

Institute of Computing Science, Poznan University of Technology, Piotrowo 2, 60-965 Poznan, Poland.

Institute of Bioorganic Chemistry, Polish Academy of Sciences, Noskowskiego 12/14, 61-704 Poznan, Poland.

出版信息

Brief Bioinform. 2023 May 19;24(3). doi: 10.1093/bib/bbad153.

DOI:10.1093/bib/bbad153

PMID:37096592

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10199776/

Abstract

Since the 1980s, dozens of computational methods have addressed the problem of predicting RNA secondary structure. Among them are those that follow standard optimization approaches and, more recently, machine learning (ML) algorithms. The former were repeatedly benchmarked on various datasets. The latter, on the other hand, have not yet undergone extensive analysis that could suggest to the user which algorithm best fits the problem to be solved. In this review, we compare 15 methods that predict the secondary structure of RNA, of which 6 are based on deep learning (DL), 3 on shallow learning (SL) and 6 control methods on non-ML approaches. We discuss the ML strategies implemented and perform three experiments in which we evaluate the prediction of (I) representatives of the RNA equivalence classes, (II) selected Rfam sequences and (III) RNAs from new Rfam families. We show that DL-based algorithms (such as SPOT-RNA and UFold) can outperform SL and traditional methods if the data distribution is similar in the training and testing set. However, when predicting 2D structures for new RNA families, the advantage of DL is no longer clear, and its performance is inferior or equal to that of SL and non-ML methods.

摘要

自 20 世纪 80 年代以来，已有数十种计算方法致力于解决 RNA 二级结构预测的问题。其中包括遵循标准优化方法的方法，以及最近的机器学习 (ML) 算法。前者在各种数据集上进行了反复的基准测试。另一方面，后者尚未进行广泛的分析，无法向用户建议哪种算法最适合要解决的问题。在这篇综述中，我们比较了 15 种预测 RNA 二级结构的方法，其中 6 种基于深度学习 (DL)，3 种基于浅层学习 (SL)，6 种控制方法基于非 ML 方法。我们讨论了所实现的 ML 策略，并进行了三个实验，其中我们评估了 (I) RNA 等价类代表、(II) 选定的 Rfam 序列和 (III) 来自新 Rfam 家族的 RNA 的预测。我们表明，如果训练集和测试集中的数据分布相似，基于 DL 的算法（如 SPOT-RNA 和 UFold）可以优于 SL 和传统方法。然而，当预测新 RNA 家族的 2D 结构时，DL 的优势不再明显，其性能不如 SL 和非 ML 方法。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3cd3/10199776/273434a076c2/bbad153f1.jpg

相似文献

Machine learning for RNA 2D structure prediction benchmarked on experimental data.

Brief Bioinform. 2023 May 19;24(3). doi: 10.1093/bib/bbad153.

UFold: fast and accurate RNA secondary structure prediction with deep learning.

Nucleic Acids Res. 2022 Feb 22;50(3):e14. doi: 10.1093/nar/gkab1074.

MSFF-CDCGAN: A novel method to predict RNA secondary structure based on Generative Adversarial Network.

Methods. 2022 Aug;204:368-375. doi: 10.1016/j.ymeth.2022.04.004. Epub 2022 Apr 28.

Machine Learning Analysis of RNA-seq Data for Diagnostic and Prognostic Prediction of Colon Cancer.

Sensors (Basel). 2023 Mar 13;23(6):3080. doi: 10.3390/s23063080.

Analysis of energy-based algorithms for RNA secondary structure prediction.

BMC Bioinformatics. 2012 Feb 1;13:22. doi: 10.1186/1471-2105-13-22.

RNA3DB: A structurally-dissimilar dataset split for training and benchmarking deep learning models for RNA structure prediction.

J Mol Biol. 2024 Sep 1;436(17):168552. doi: 10.1016/j.jmb.2024.168552. Epub 2024 Mar 27.

Protein-RNA interface residue prediction using machine learning: an assessment of the state of the art.

BMC Bioinformatics. 2012 May 10;13:89. doi: 10.1186/1471-2105-13-89.

RNA secondary structure prediction using deep learning with thermodynamic integration.

Nat Commun. 2021 Feb 11;12(1):941. doi: 10.1038/s41467-021-21194-4.

A deep learning method for lincRNA detection using auto-encoder algorithm.

BMC Bioinformatics. 2017 Dec 6;18(Suppl 15):511. doi: 10.1186/s12859-017-1922-3.

Benchmarking deep learning models on large healthcare datasets.

J Biomed Inform. 2018 Jul;83:112-134. doi: 10.1016/j.jbi.2018.04.007. Epub 2018 Jun 5.

引用本文的文献

Enhanced Generalizability of RNA Secondary Structure Prediction via Convolutional Block Attention Network and Ensemble Learning.

Molecules. 2025 Aug 21;30(16):3447. doi: 10.3390/molecules30163447.

Assessment of nucleic acid structure prediction in CASP16.

bioRxiv. 2025 May 10:2025.05.06.652459. doi: 10.1101/2025.05.06.652459.

Comprehensive datasets for RNA design, machine learning, and beyond.

Sci Rep. 2025 Jul 1;15(1):21417. doi: 10.1038/s41598-025-07041-2.

RiNALMo: general-purpose RNA language models can generalize well on structure prediction tasks.

Nat Commun. 2025 Jul 1;16(1):5671. doi: 10.1038/s41467-025-60872-5.

Deep generalizable prediction of RNA secondary structure via base pair motif energy.

Nat Commun. 2025 Jul 1;16(1):5856. doi: 10.1038/s41467-025-60048-1.

Analysis of natural structures and chemical mapping data reveals local stability compensation in RNA.

Nucleic Acids Res. 2025 Jun 20;53(12). doi: 10.1093/nar/gkaf565.

RNA secondary structure prediction by conducting multi-class classifications.

Comput Struct Biotechnol J. 2025 Apr 4;27:1449-1459. doi: 10.1016/j.csbj.2025.04.001. eCollection 2025.

Transformers in RNA structure prediction: A review.

Comput Struct Biotechnol J. 2025 Mar 17;27:1187-1203. doi: 10.1016/j.csbj.2025.03.021. eCollection 2025.

Unknotting RNA: A method to resolve computational artifacts.

PLoS Comput Biol. 2025 Mar 20;21(3):e1012843. doi: 10.1371/journal.pcbi.1012843. eCollection 2025 Mar.

Advances and Mechanisms of RNA-Ligand Interaction Predictions.

Life (Basel). 2025 Jan 15;15(1):104. doi: 10.3390/life15010104.

本文引用的文献

Caveats to Deep Learning Approaches to RNA Secondary Structure Prediction.

Front Bioinform. 2022 Jul 11;2:835422. doi: 10.3389/fbinf.2022.835422. eCollection 2022.

RNA secondary structure packages evaluated and improved by high-throughput experiments.

Nat Methods. 2022 Oct;19(10):1234-1242. doi: 10.1038/s41592-022-01605-0. Epub 2022 Oct 3.

Vfold-Pipeline: a web server for RNA 3D structure prediction from sequences.

Bioinformatics. 2022 Aug 10;38(16):4042-4043. doi: 10.1093/bioinformatics/btac426.

Deep learning models for RNA secondary structure prediction (probably) do not generalize across families.

Bioinformatics. 2022 Aug 10;38(16):3892-3899. doi: 10.1093/bioinformatics/btac415.

RNAsolo: a repository of cleaned PDB-derived RNA 3D structures.

Bioinformatics. 2022 Jul 11;38(14):3668-3670. doi: 10.1093/bioinformatics/btac386.

Persistent spectral simplicial complex-based machine learning for chromosomal structural analysis in cellular differentiation.

Brief Bioinform. 2022 Jul 18;23(4). doi: 10.1093/bib/bbac168.

UFold: fast and accurate RNA secondary structure prediction with deep learning.

Nucleic Acids Res. 2022 Feb 22;50(3):e14. doi: 10.1093/nar/gkab1074.

Research on RNA secondary structure predicting via bidirectional recurrent neural network.

BMC Bioinformatics. 2021 Sep 8;22(Suppl 3):431. doi: 10.1186/s12859-021-04332-z.

Geometric deep learning of RNA structure.

Science. 2021 Aug 27;373(6558):1047-1051. doi: 10.1126/science.abe5650.

Review of machine learning methods for RNA secondary structure prediction.

PLoS Comput Biol. 2021 Aug 26;17(8):e1009291. doi: 10.1371/journal.pcbi.1009291. eCollection 2021 Aug.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

基于实验数据的 RNA 2D 结构预测的机器学习基准测试

Machine learning for RNA 2D structure prediction benchmarked on experimental data.

机构信息

Institute of Computing Science, Poznan University of Technology, Piotrowo 2, 60-965 Poznan, Poland.

Institute of Bioorganic Chemistry, Polish Academy of Sciences, Noskowskiego 12/14, 61-704 Poznan, Poland.

出版信息

Brief Bioinform. 2023 May 19;24(3). doi: 10.1093/bib/bbad153.

DOI:10.1093/bib/bbad153

PMID:37096592

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10199776/

Abstract

摘要

基于实验数据的 RNA 2D 结构预测的机器学习基准测试

Machine learning for RNA 2D structure prediction benchmarked on experimental data.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

基于实验数据的 RNA 2D 结构预测的机器学习基准测试

Machine learning for RNA 2D structure prediction benchmarked on experimental data.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献