Zhang Hao, Zhang Chunhe, Li Zhi, Li Cong, Wei Xu, Zhang Borui, Liu Yuanning
College of Computer Science and Technology and Symbol Computation and Knowledge Engineering, Ministry of Education, Jilin University, Changchun, China.
College of Computer Science and Technology, Changchun University of Science and Technology, Changchun, China.
Front Genet. 2019 May 22;10:467. doi: 10.3389/fgene.2019.00467. eCollection 2019.
In recent years, obtaining RNA secondary structure information has played an important role in RNA and gene function research. Although some RNA secondary structures can be gained experimentally, in most cases, efficient, and accurate computational methods are still needed to predict RNA secondary structure. Current RNA secondary structure prediction methods are mainly based on the minimum free energy algorithm, which finds the optimal folding state of RNA using an iterative method to meet the minimum energy or other constraints. However, due to the complexity of biotic environment, a true RNA structure always keeps the balance of biological potential energy status, rather than the optimal folding status that meets the minimum energy. For short sequence RNA its equilibrium energy status for the RNA folding organism is close to the minimum free energy status; therefore, the minimum free energy algorithm for predicting RNA secondary structure has higher accuracy. Nevertheless, in a longer sequence RNA, constant folding causes its biopotential energy balance to deviate far from the minimum free energy status. This deviation is because of its complex structure and results in a serious decline in the prediction accuracy of its secondary structure. In this paper, we propose a novel RNA secondary structure prediction algorithm using a convolutional neural network model combined with a dynamic programming method to improve the accuracy with large-scale RNA sequence and structure data. We analyze current experimental RNA sequences and structure data to construct a deep convolutional network model, and then we extract implicit features of an effective classification from large-scale data to predict the pairing probability of each base in an RNA sequence. For the obtained probabilities of RNA sequence base pairing, an enhanced dynamic programming method is applied to obtain the optimal RNA secondary structure. Results indicate that our proposed method is superior to the common RNA secondary structure prediction algorithms in predicting three benchmark RNA families. Based on the characteristics of deep learning algorithm, it can be inferred that the method proposed in this paper has a 30% higher prediction success rate when compared with other algorithms, which will be needed as the amount of real RNA structure data increases in the future.
近年来,获取RNA二级结构信息在RNA和基因功能研究中发挥了重要作用。虽然一些RNA二级结构可以通过实验获得,但在大多数情况下,仍需要高效、准确的计算方法来预测RNA二级结构。当前的RNA二级结构预测方法主要基于最小自由能算法,该算法使用迭代方法找到RNA的最佳折叠状态,以满足最小能量或其他约束条件。然而,由于生物环境的复杂性,真实的RNA结构总是保持生物势能状态的平衡,而不是满足最小能量的最佳折叠状态。对于短序列RNA,其折叠生物体的平衡能量状态接近最小自由能状态;因此,用于预测RNA二级结构的最小自由能算法具有较高的准确性。然而,在较长序列的RNA中,持续折叠会导致其生物势能平衡远离最小自由能状态。这种偏差是由于其结构复杂,导致其二级结构预测准确性严重下降。在本文中,我们提出了一种新颖的RNA二级结构预测算法,该算法使用卷积神经网络模型结合动态规划方法,以提高对大规模RNA序列和结构数据的预测准确性。我们分析当前的实验RNA序列和结构数据,构建一个深度卷积网络模型,然后从大规模数据中提取有效分类的隐含特征,以预测RNA序列中每个碱基的配对概率。对于获得的RNA序列碱基配对概率,应用增强的动态规划方法来获得最佳的RNA二级结构。结果表明,我们提出的方法在预测三个基准RNA家族时优于常见的RNA二级结构预测算法。基于深度学习算法的特点,可以推断,与其他算法相比,本文提出的方法在未来随着真实RNA结构数据量的增加时,预测成功率高出30%。