Suppr超能文献

基于双向递归神经网络的 RNA 二级结构预测研究。

Research on RNA secondary structure predicting via bidirectional recurrent neural network.

机构信息

School of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou, 215009, China.

Jiangsu Province Key Laboratory of Intelligent Building Energy Efficiency, Suzhou University of Science and Technology, Suzhou, 215009, China.

出版信息

BMC Bioinformatics. 2021 Sep 8;22(Suppl 3):431. doi: 10.1186/s12859-021-04332-z.

Abstract

BACKGROUND

RNA secondary structure prediction is an important research content in the field of biological information. Predicting RNA secondary structure with pseudoknots has been proved to be an NP-hard problem. Traditional machine learning methods can not effectively apply protein sequence information with different sequence lengths to the prediction process due to the constraint of the self model when predicting the RNA secondary structure. In addition, there is a large difference between the number of paired bases and the number of unpaired bases in the RNA sequences, which means the problem of positive and negative sample imbalance is easy to make the model fall into a local optimum. To solve the above problems, this paper proposes a variable-length dynamic bidirectional Gated Recurrent Unit(VLDB GRU) model. The model can accept sequences with different lengths through the introduction of flag vector. The model can also make full use of the base information before and after the predicted base and can avoid losing part of the information due to truncation. Introducing a weight vector to predict the RNA training set by dynamically adjusting each base loss function solves the problem of balanced sample imbalance.

RESULTS

The algorithm proposed in this paper is compared with the existing algorithms on five representative subsets of the data set RNA STRAND. The experimental results show that the accuracy and Matthews correlation coefficient of the method are improved by 4.7% and 11.4%, respectively.

CONCLUSIONS

The flag vector introduced allows the model to effectively use the information before and after the protein sequence; the introduced weight vector solves the problem of unbalanced sample balance. Compared with other algorithms, the LVDB GRU algorithm proposed in this paper has the best detection results.

摘要

背景

RNA 二级结构预测是生物信息领域的一个重要研究内容。具有假结的 RNA 二级结构预测已被证明是 NP 难问题。传统的机器学习方法由于在预测 RNA 二级结构时自我模型的限制,不能有效地将具有不同序列长度的蛋白质序列信息应用于预测过程中。此外,RNA 序列中配对碱基和未配对碱基的数量差异很大,这意味着正负样本不平衡的问题很容易使模型陷入局部最优。为了解决上述问题,本文提出了一种可变长度动态双向门控循环单元(VLDB GRU)模型。该模型可以通过引入标志向量来接受不同长度的序列。该模型还可以充分利用预测碱基前后的碱基信息,避免因截断而丢失部分信息。通过引入权重向量来动态调整每个碱基损失函数来预测 RNA 训练集,解决了样本不平衡的问题。

结果

本文提出的算法与数据集 RNA STRAND 的五个典型子集上的现有算法进行了比较。实验结果表明,该方法的准确率和马修斯相关系数分别提高了 4.7%和 11.4%。

结论

引入的标志向量允许模型有效地利用蛋白质序列前后的信息;引入的权重向量解决了样本不平衡的问题。与其他算法相比,本文提出的 LVDB GRU 算法具有最佳的检测结果。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6a68/8427827/768fa5ee8906/12859_2021_4332_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验