Department of Biosciences and Informatics, Keio University, 3-14-1 Hiyoshi, Kohoku-ku, Yokohama, Japan.
Nat Commun. 2021 Feb 11;12(1):941. doi: 10.1038/s41467-021-21194-4.
Accurate predictions of RNA secondary structures can help uncover the roles of functional non-coding RNAs. Although machine learning-based models have achieved high performance in terms of prediction accuracy, overfitting is a common risk for such highly parameterized models. Here we show that overfitting can be minimized when RNA folding scores learnt using a deep neural network are integrated together with Turner's nearest-neighbor free energy parameters. Training the model with thermodynamic regularization ensures that folding scores and the calculated free energy are as close as possible. In computational experiments designed for newly discovered non-coding RNAs, our algorithm (MXfold2) achieves the most robust and accurate predictions of RNA secondary structures without sacrificing computational efficiency compared to several other algorithms. The results suggest that integrating thermodynamic information could help improve the robustness of deep learning-based predictions of RNA secondary structure.
准确预测 RNA 二级结构有助于揭示功能非编码 RNA 的作用。虽然基于机器学习的模型在预测准确性方面取得了很高的性能,但对于这种高度参数化的模型来说,过拟合是一个常见的风险。在这里,我们表明,当使用深度神经网络学习的 RNA 折叠分数与 Turner 的最近邻自由能参数集成在一起时,可以最大限度地减少过拟合。使用热力学正则化训练模型可确保折叠分数和计算出的自由能尽可能接近。在针对新发现的非编码 RNA 设计的计算实验中,与其他几种算法相比,我们的算法(MXfold2)在不牺牲计算效率的情况下,实现了对 RNA 二级结构最稳健和准确的预测。结果表明,整合热力学信息可以帮助提高基于深度学习的 RNA 二级结构预测的稳健性。