Lin Hanbo, Hou Dongyue, Li Zhaoyite, Wang Shuaiqi, Liu Yuchen, Gu Jiajie, Qian Juncheng, Yin Ruining, Zhao Hui, Wang Shaofei, Chen Yuzong, Ju Dianwen, Zeng Xian
School of Pharmaceutical Sciences, Shanghai Engineering Research Center of Immunotherapeutics, Fudan University, Shanghai 201203, China.
The State Key Laboratory of Chemical Oncogenomics, Key Laboratory of Chemical Biology, Tsinghua Shenzhen International Graduate School, Tsinghua University, Shenzhen 518055, China.
Molecules. 2025 Aug 21;30(16):3447. doi: 10.3390/molecules30163447.
The determination of RNA secondary structure (RSS) could help understand RNA's functional mechanisms, guiding the design of RNA-based therapeutics, and advancing synthetic biology applications. However, traditional methods such as NMR for determining RSS are typically time-consuming and labor-intensive. As a result, the accurate prediction of RSS remains a fundamental yet unmet need in RNA research. Various deep learning (DL)-based methods achieved improved accuracy over thermodynamic-based methods. However, the over-parameterization nature of DL makes these methods prone to overfitting and thus limits their generalizability. Meanwhile, the inconsistency of RSS predictions between these methods further aggravated the crisis of generalizability. Here, we propose TrioFold to achieve enhanced generalizability of RSS prediction by integrating base-pairing clues learned from both thermodynamic- and DL-based methods by ensemble learning and convolutional block attention mechanism. TrioFold achieves higher accuracy in intra-family predictions and enhanced generalizability in inter-family and cross-RNA-types predictions. Additionally, we have developed an online webserver equipped with widely used RSS prediction algorithms and analysis tools, providing an accessible platform for the RNA research community. This study demonstrated new opportunities to improve generalizability for RSS predictions by efficient ensemble learning of base-pairing clues learned from both thermodynamic- and DL-based algorithms.
RNA二级结构(RSS)的确定有助于理解RNA的功能机制,指导基于RNA的治疗方法设计,并推动合成生物学应用。然而,传统的确定RSS的方法,如核磁共振(NMR),通常既耗时又费力。因此,准确预测RSS仍然是RNA研究中一个基本但尚未满足的需求。各种基于深度学习(DL)的方法在准确性上优于基于热力学的方法。然而,DL的过度参数化性质使这些方法容易出现过拟合,从而限制了它们的通用性。同时,这些方法之间RSS预测的不一致性进一步加剧了通用性危机。在此,我们提出TrioFold,通过集成从基于热力学和DL的方法中通过集成学习和卷积块注意力机制学到的碱基配对线索,实现RSS预测通用性的增强。TrioFold在家族内预测中实现了更高的准确性,在家族间和跨RNA类型预测中增强了通用性。此外,我们开发了一个配备广泛使用的RSS预测算法和分析工具的在线网络服务器,为RNA研究社区提供了一个可访问的平台。本研究展示了通过对从基于热力学和DL的算法中学到的碱基配对线索进行高效集成学习来提高RSS预测通用性的新机会。