Yang Jiyuan, Sato Kengo, Loza Martin, Park Sung-Joon, Nakai Kenta
Department of Computer Science, the Graduate School of Information Science and Technology, the University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, 113-8656, Tokyo, Japan.
School of Life Science and Technology, Institute of Science Tokyo, 2-12-1-M6-12, Ookayama, Meguro-ku, 152-8550, Tokyo, Japan.
Comput Struct Biotechnol J. 2025 Apr 4;27:1449-1459. doi: 10.1016/j.csbj.2025.04.001. eCollection 2025.
Generating valid predictions of RNA secondary structures is challenging. Several deep learning methods have been developed for predicting RNA secondary structures. However, they commonly adopt post-processing steps to adjust the model output to produce valid predictions, which are complicated and could limit the performance. In this study, we propose a simple method by considering RNA secondary structure prediction as multiple multi-class classifications, which eliminates the need for those complicated post-processing steps. Then, we use this method to train and evaluate our model based on the attention mechanism and the convolutional neural network. Besides, we introduce two additional methods, including data augmentation to further improve the within-RNA-family performance and a method to alleviate the performance drop in the cross-RNA-family evaluation. In summary, we could produce valid predictions and achieve better performance without complex post-processing steps, and we show our additional methods are beneficial to the performance in within-RNA-family and cross-RNA-family evaluations.
生成有效的RNA二级结构预测具有挑战性。已经开发了几种深度学习方法来预测RNA二级结构。然而,它们通常采用后处理步骤来调整模型输出以产生有效的预测,这些步骤很复杂并且可能会限制性能。在本研究中,我们提出了一种简单的方法,即将RNA二级结构预测视为多个多类分类,从而无需那些复杂的后处理步骤。然后,我们使用此方法基于注意力机制和卷积神经网络来训练和评估我们的模型。此外,我们引入了两种额外的方法,包括数据增强以进一步提高RNA家族内的性能,以及一种减轻跨RNA家族评估中性能下降的方法。总之,我们可以在没有复杂后处理步骤的情况下产生有效的预测并实现更好的性能,并且我们表明我们的额外方法对RNA家族内和跨RNA家族评估中的性能有益。