College of Software, Jilin University, Changchun 130000, PR China.
College of Software, Jilin University, Changchun 130000, PR China; College of Computer Science and Technology, Jilin University, Jilin 130000, PR China.
Methods. 2022 Aug;204:368-375. doi: 10.1016/j.ymeth.2022.04.004. Epub 2022 Apr 28.
Access to RNA secondary structure is a prerequisite for understanding and mastering RNA function. RNA secondary structures play an important role in cells, they can cause or contribute to neurological disorders and can be applied in the medical field. However, the experimental method to obtain RNA secondary structure is costly, laborious and not universal. Although computational methods can predict RNA secondary structure more accurately for short-sequence RNAs, it cannot predict long-sequence RNAs and pseudoknot, which is the bottleneck of RNA secondary structure prediction at present. In recent years, researchers have attempted to use deep learning algorithms to predict RNA secondary structure and have achieved results. However, the small amount of data on the secondary structure of long-sequence RNAs leads to the low accuracy of deep learning methods to predict the secondary structure of RNAs across races. Similarly, RNA structure with pseudoknot is very complex and insufficient data caused the deep learning algorithm to struggle to predict the secondary structure of RNA containing pseudoknots. The RNA data are encoded into grayscale images by a unique encoding method based on the real RNA secondary structure and sequence information. Then, this paper reasonably expands the image data to increase the amount of RNA data, solves the problem of insufficient data for predicting long sequences and RNA secondary structure with pseudoknots in current deep learning methods, and provides a good data foundation for deep learning.The article proposes a multi-scale feature fusion Conditional Deep Convolutional Generative Adversarial Network prediction model (MSFF-CDCGAN) based on the improved Conditional Deep Convolutional Generative Adversarial Network (CDCGAN) model to predict RNA secondary structure. The experimental results showed that the MSFF-CDCGAN model could predict long-sequence RNAs and pseudoknots more accurately than traditional prediction methods. This paper introduces Generative Adversarial Network (GAN) to RNA secondary structure prediction for the first time. It uses a unique image encoding approach to expand the original RNA data set, thus transforming the structure prediction problem into an image analysis problem and effectively solving the bottleneck in RNA secondary structure prediction.
获取 RNA 二级结构是理解和掌握 RNA 功能的前提。RNA 二级结构在细胞中起着重要的作用,它们可以导致或促成神经紊乱,并可以在医学领域得到应用。然而,获得 RNA 二级结构的实验方法既昂贵又费力,而且不具有普遍性。尽管计算方法可以更准确地预测短序列 RNA 的二级结构,但它不能预测长序列 RNA 和假结,这是当前 RNA 二级结构预测的瓶颈。近年来,研究人员尝试使用深度学习算法来预测 RNA 二级结构,并取得了成果。然而,长序列 RNA 二级结构的数据量较少,导致深度学习方法对跨种族 RNA 二级结构预测的准确性较低。同样,具有假结的 RNA 结构非常复杂,数据不足导致深度学习算法难以预测含有假结的 RNA 的二级结构。通过一种基于真实 RNA 二级结构和序列信息的独特编码方法,将 RNA 数据编码为灰度图像。然后,本文合理地扩展了图像数据,增加了 RNA 数据量,解决了当前深度学习方法中预测长序列和具有假结的 RNA 二级结构数据不足的问题,为深度学习提供了良好的数据基础。本文提出了一种基于改进的条件深度卷积生成对抗网络(CDCGAN)模型的多尺度特征融合条件深度卷积生成对抗网络预测模型(MSFF-CDCGAN),用于预测 RNA 二级结构。实验结果表明,MSFF-CDCGAN 模型比传统预测方法更能准确地预测长序列和假结。本文首次将生成对抗网络(GAN)引入到 RNA 二级结构预测中,它采用独特的图像编码方法来扩展原始的 RNA 数据集,从而将结构预测问题转化为图像分析问题,有效地解决了 RNA 二级结构预测的瓶颈问题。