基于双向递归神经网络的 RNA 二级结构预测研究。

Research on RNA secondary structure predicting via bidirectional recurrent neural network.

机构信息

School of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou, 215009, China.

Jiangsu Province Key Laboratory of Intelligent Building Energy Efficiency, Suzhou University of Science and Technology, Suzhou, 215009, China.

出版信息

BMC Bioinformatics. 2021 Sep 8;22(Suppl 3):431. doi: 10.1186/s12859-021-04332-z.

DOI:10.1186/s12859-021-04332-z

PMID:34496763

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8427827/

Abstract

BACKGROUND

RNA secondary structure prediction is an important research content in the field of biological information. Predicting RNA secondary structure with pseudoknots has been proved to be an NP-hard problem. Traditional machine learning methods can not effectively apply protein sequence information with different sequence lengths to the prediction process due to the constraint of the self model when predicting the RNA secondary structure. In addition, there is a large difference between the number of paired bases and the number of unpaired bases in the RNA sequences, which means the problem of positive and negative sample imbalance is easy to make the model fall into a local optimum. To solve the above problems, this paper proposes a variable-length dynamic bidirectional Gated Recurrent Unit(VLDB GRU) model. The model can accept sequences with different lengths through the introduction of flag vector. The model can also make full use of the base information before and after the predicted base and can avoid losing part of the information due to truncation. Introducing a weight vector to predict the RNA training set by dynamically adjusting each base loss function solves the problem of balanced sample imbalance.

RESULTS

The algorithm proposed in this paper is compared with the existing algorithms on five representative subsets of the data set RNA STRAND. The experimental results show that the accuracy and Matthews correlation coefficient of the method are improved by 4.7% and 11.4%, respectively.

CONCLUSIONS

The flag vector introduced allows the model to effectively use the information before and after the protein sequence; the introduced weight vector solves the problem of unbalanced sample balance. Compared with other algorithms, the LVDB GRU algorithm proposed in this paper has the best detection results.

摘要

背景

RNA 二级结构预测是生物信息领域的一个重要研究内容。具有假结的 RNA 二级结构预测已被证明是 NP 难问题。传统的机器学习方法由于在预测 RNA 二级结构时自我模型的限制，不能有效地将具有不同序列长度的蛋白质序列信息应用于预测过程中。此外，RNA 序列中配对碱基和未配对碱基的数量差异很大，这意味着正负样本不平衡的问题很容易使模型陷入局部最优。为了解决上述问题，本文提出了一种可变长度动态双向门控循环单元（VLDB GRU）模型。该模型可以通过引入标志向量来接受不同长度的序列。该模型还可以充分利用预测碱基前后的碱基信息，避免因截断而丢失部分信息。通过引入权重向量来动态调整每个碱基损失函数来预测 RNA 训练集，解决了样本不平衡的问题。

结果

本文提出的算法与数据集 RNA STRAND 的五个典型子集上的现有算法进行了比较。实验结果表明，该方法的准确率和马修斯相关系数分别提高了 4.7%和 11.4%。

结论

引入的标志向量允许模型有效地利用蛋白质序列前后的信息；引入的权重向量解决了样本不平衡的问题。与其他算法相比，本文提出的 LVDB GRU 算法具有最佳的检测结果。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6a68/8427827/768fa5ee8906/12859_2021_4332_Fig1_HTML.jpg

相似文献

Research on RNA secondary structure predicting via bidirectional recurrent neural network.基于双向递归神经网络的 RNA 二级结构预测研究。

BMC Bioinformatics. 2021 Sep 8;22(Suppl 3):431. doi: 10.1186/s12859-021-04332-z.

BAT-Net: An enhanced RNA Secondary Structure prediction via bidirectional GRU-based network with attention mechanism.BAT-Net：基于双向 GRU 的注意力机制增强 RNA 二级结构预测网络。

Comput Biol Chem. 2022 Dec;101:107765. doi: 10.1016/j.compbiolchem.2022.107765. Epub 2022 Sep 1.

Predicting RNA secondary structure via adaptive deep recurrent neural networks with energy-based filter.基于能量过滤的自适应深度递归神经网络预测 RNA 二级结构

BMC Bioinformatics. 2019 Dec 24;20(Suppl 25):684. doi: 10.1186/s12859-019-3258-7.

MSFF-CDCGAN: A novel method to predict RNA secondary structure based on Generative Adversarial Network.MSFF-CDCGAN：一种基于生成对抗网络预测 RNA 二级结构的新方法。

Methods. 2022 Aug;204:368-375. doi: 10.1016/j.ymeth.2022.04.004. Epub 2022 Apr 28.

ATTfold: RNA Secondary Structure Prediction With Pseudoknots Based on Attention Mechanism.ATTfold：基于注意力机制的带假结RNA二级结构预测

Front Genet. 2020 Dec 15;11:612086. doi: 10.3389/fgene.2020.612086. eCollection 2020.

Predicting RNA secondary structures with arbitrary pseudoknots by maximizing the number of stacking pairs.通过最大化堆叠对的数量来预测具有任意假结的RNA二级结构。

J Comput Biol. 2003;10(6):981-95. doi: 10.1089/106652703322756186.

RNA Secondary Structure Prediction with Pseudoknots Using Chemical Reaction Optimization Algorithm.基于化学反应优化算法的含假结RNA二级结构预测

IEEE/ACM Trans Comput Biol Bioinform. 2021 May-Jun;18(3):1195-1207. doi: 10.1109/TCBB.2019.2936570. Epub 2021 Jun 3.

RNA secondary structure prediction with convolutional neural networks.基于卷积神经网络的 RNA 二级结构预测。

BMC Bioinformatics. 2022 Feb 2;23(1):58. doi: 10.1186/s12859-021-04540-7.

Direct Inference of Base-Pairing Probabilities with Neural Networks Improves Prediction of RNA Secondary Structures with Pseudoknots.利用神经网络直接推断碱基配对概率可提高具有假结的 RNA 二级结构预测。

Genes (Basel). 2022 Nov 18;13(11):2155. doi: 10.3390/genes13112155.

[An iterative method for prediction of RNA secondary structures including pseudoknots based on minimum of free energy and covariance].一种基于最小自由能和协方差预测包括假结在内的RNA二级结构的迭代方法

Yi Chuan. 2007 Jul;29(7):889-97. doi: 10.1360/yc-007-0889.

引用本文的文献

A divide-and-conquer approach based on deep learning for long RNA secondary structure prediction: Focus on pseudoknots identification.一种基于深度学习的分治方法用于长链RNA二级结构预测：聚焦于假结识别。

PLoS One. 2025 Apr 25;20(4):e0314837. doi: 10.1371/journal.pone.0314837. eCollection 2025.

Transformers in RNA structure prediction: A review.RNA结构预测中的Transformer：综述

Comput Struct Biotechnol J. 2025 Mar 17;27:1187-1203. doi: 10.1016/j.csbj.2025.03.021. eCollection 2025.

Molecular insights into regulatory RNAs in the cellular machinery.分子层面解析细胞机制中的调控 RNA。

Exp Mol Med. 2024 Jun;56(6):1235-1249. doi: 10.1038/s12276-024-01239-6. Epub 2024 Jun 14.

Deciphering phenotyping, DNA barcoding, and RNA secondary structure predictions in eggplant wild relatives provide insights for their future breeding strategies.解析茄子野生近缘种的表型、DNA 条形码和 RNA 二级结构预测，为其未来的育种策略提供了见解。

Sci Rep. 2023 Aug 24;13(1):13829. doi: 10.1038/s41598-023-40797-z.

Machine learning for RNA 2D structure prediction benchmarked on experimental data.基于实验数据的 RNA 2D 结构预测的机器学习基准测试

Brief Bioinform. 2023 May 19;24(3). doi: 10.1093/bib/bbad153.

Identifying Membrane Protein Types Based on Lifelong Learning With Dynamically Scalable Networks.基于动态可扩展网络的终身学习识别膜蛋白类型

Front Genet. 2022 Mar 14;12:834488. doi: 10.3389/fgene.2021.834488. eCollection 2021.

本文引用的文献

Clinical Value of RNA Sequencing-Based Classifiers for Prediction of the Five Conventional Breast Cancer Biomarkers: A Report From the Population-Based Multicenter Sweden Cancerome Analysis Network-Breast Initiative.基于RNA测序的分类器对五种传统乳腺癌生物标志物预测的临床价值：来自基于人群的多中心瑞典癌症基因组分析网络-乳腺癌倡议的报告

JCO Precis Oncol. 2018 Mar 9;2. doi: 10.1200/PO.17.00135. eCollection 2018.

Complement factor B knockdown by short hairpin RNA inhibits laser-induced choroidal neovascularization in rats.短发夹RNA敲低补体因子B可抑制大鼠激光诱导的脉络膜新生血管形成。

Int J Ophthalmol. 2020 Mar 18;13(3):382-389. doi: 10.18240/ijo.2020.03.03. eCollection 2020.

Research on predicting 2D-HP protein folding using reinforcement learning with full state space.基于全状态空间的强化学习预测 2D-HP 蛋白质折叠的研究。

BMC Bioinformatics. 2019 Dec 24;20(Suppl 25):685. doi: 10.1186/s12859-019-3259-6.

Predicting RNA secondary structure via adaptive deep recurrent neural networks with energy-based filter.基于能量过滤的自适应深度递归神经网络预测 RNA 二级结构

BMC Bioinformatics. 2019 Dec 24;20(Suppl 25):684. doi: 10.1186/s12859-019-3258-7.

Ranking near-native candidate protein structures via random forest classification.基于随机森林分类的近天然候选蛋白结构排序。

BMC Bioinformatics. 2019 Dec 24;20(Suppl 25):683. doi: 10.1186/s12859-019-3257-8.

Prediction of the RNA Secondary Structure Using a Multi-Population Assisted Quantum Genetic Algorithm.基于多群体辅助量子遗传算法的RNA二级结构预测

Hum Hered. 2019;84(1):1-8. doi: 10.1159/000501480. Epub 2019 Aug 28.

RNA Sequencing Analysis of Molecular Basis of Sodium Butyrate-Induced Growth Inhibition on Colorectal Cancer Cell Lines.基于 RNA 测序的丁酸诱导结直肠癌细胞生长抑制的分子基础分析。

Biomed Res Int. 2019 Feb 27;2019:1427871. doi: 10.1155/2019/1427871. eCollection 2019.

A max-margin training of RNA secondary structure prediction integrated with the thermodynamic model.一种结合热力学模型的RNA二级结构预测的最大间隔训练。

J Bioinform Comput Biol. 2018 Dec;16(6):1840025. doi: 10.1142/S0219720018400255.

Multiple Sequence Alignments Enhance Boundary Definition of RNA Structures.多序列比对增强RNA结构的边界定义。

Genes (Basel). 2018 Dec 4;9(12):604. doi: 10.3390/genes9120604.

Identification of protein subcellular localization via integrating evolutionary and physicochemical information into Chou's general PseAAC.通过将进化和物理化学信息整合到 Chou 的通用 PseAAC 中，鉴定蛋白质亚细胞定位。

J Theor Biol. 2019 Feb 7;462:230-239. doi: 10.1016/j.jtbi.2018.11.012. Epub 2018 Nov 16.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

基于双向递归神经网络的 RNA 二级结构预测研究。

Research on RNA secondary structure predicting via bidirectional recurrent neural network.

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSIONS

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献