Zhu Heqin, Tang Fenghe, Quan Quan, Chen Ke, Xiong Peng, Zhou S Kevin
School of Biomedical Engineering, Division of Life Sciences and Medicine, University of Science and Technology of China (USTC), Hefei, Anhui, 230026, China.
Suzhou Institute for Advanced Research, USTC, Suzhou, Jiangsu, 215123, China.
Nat Commun. 2025 Jul 1;16(1):5856. doi: 10.1038/s41467-025-60048-1.
Deep learning methods have demonstrated great performance for RNA secondary structure prediction. However, generalizability is a common unsolved issue on unseen out-of-distribution RNA families, which hinders further improvement of the accuracy and robustness of deep learning methods. Here we construct a base pair motif library that enumerates the complete space of the locally adjacent three-neighbor base pair and records the thermodynamic energy of corresponding base pair motifs through de novo modeling of tertiary structures, and we further develop a deep learning approach for RNA secondary structure prediction, named BPfold, which learns relationship between RNA sequence and the energy map of base pair motif. Experiments on sequence-wise and family-wise datasets have demonstrated the great superiority of BPfold compared to other state-of-the-art approaches in accuracy and generalizability. We hope this work contributes to integrating physical priors and deep learning methods for the further discovery of RNA structures and functionalities.
深度学习方法在RNA二级结构预测方面已展现出卓越性能。然而,对于未见的分布外RNA家族,泛化性是一个常见的未解决问题,这阻碍了深度学习方法在准确性和鲁棒性上的进一步提升。在此,我们构建了一个碱基对基序库,它枚举了局部相邻三邻居碱基对的完整空间,并通过三级结构的从头建模记录相应碱基对基序的热力学能量,而且我们进一步开发了一种用于RNA二级结构预测的深度学习方法,名为BPfold,它学习RNA序列与碱基对基序能量图之间的关系。在序列级和家族级数据集上的实验表明,与其他现有最先进方法相比,BPfold在准确性和泛化性方面具有巨大优势。我们希望这项工作有助于整合物理先验知识和深度学习方法,以进一步发现RNA的结构和功能。