Suppr超能文献

通过卷积块注意力网络和集成学习提高RNA二级结构预测的泛化能力

Enhanced Generalizability of RNA Secondary Structure Prediction via Convolutional Block Attention Network and Ensemble Learning.

作者信息

Lin Hanbo, Hou Dongyue, Li Zhaoyite, Wang Shuaiqi, Liu Yuchen, Gu Jiajie, Qian Juncheng, Yin Ruining, Zhao Hui, Wang Shaofei, Chen Yuzong, Ju Dianwen, Zeng Xian

机构信息

School of Pharmaceutical Sciences, Shanghai Engineering Research Center of Immunotherapeutics, Fudan University, Shanghai 201203, China.

The State Key Laboratory of Chemical Oncogenomics, Key Laboratory of Chemical Biology, Tsinghua Shenzhen International Graduate School, Tsinghua University, Shenzhen 518055, China.

出版信息

Molecules. 2025 Aug 21;30(16):3447. doi: 10.3390/molecules30163447.

Abstract

The determination of RNA secondary structure (RSS) could help understand RNA's functional mechanisms, guiding the design of RNA-based therapeutics, and advancing synthetic biology applications. However, traditional methods such as NMR for determining RSS are typically time-consuming and labor-intensive. As a result, the accurate prediction of RSS remains a fundamental yet unmet need in RNA research. Various deep learning (DL)-based methods achieved improved accuracy over thermodynamic-based methods. However, the over-parameterization nature of DL makes these methods prone to overfitting and thus limits their generalizability. Meanwhile, the inconsistency of RSS predictions between these methods further aggravated the crisis of generalizability. Here, we propose TrioFold to achieve enhanced generalizability of RSS prediction by integrating base-pairing clues learned from both thermodynamic- and DL-based methods by ensemble learning and convolutional block attention mechanism. TrioFold achieves higher accuracy in intra-family predictions and enhanced generalizability in inter-family and cross-RNA-types predictions. Additionally, we have developed an online webserver equipped with widely used RSS prediction algorithms and analysis tools, providing an accessible platform for the RNA research community. This study demonstrated new opportunities to improve generalizability for RSS predictions by efficient ensemble learning of base-pairing clues learned from both thermodynamic- and DL-based algorithms.

摘要

RNA二级结构(RSS)的确定有助于理解RNA的功能机制,指导基于RNA的治疗方法设计,并推动合成生物学应用。然而,传统的确定RSS的方法,如核磁共振(NMR),通常既耗时又费力。因此,准确预测RSS仍然是RNA研究中一个基本但尚未满足的需求。各种基于深度学习(DL)的方法在准确性上优于基于热力学的方法。然而,DL的过度参数化性质使这些方法容易出现过拟合,从而限制了它们的通用性。同时,这些方法之间RSS预测的不一致性进一步加剧了通用性危机。在此,我们提出TrioFold,通过集成从基于热力学和DL的方法中通过集成学习和卷积块注意力机制学到的碱基配对线索,实现RSS预测通用性的增强。TrioFold在家族内预测中实现了更高的准确性,在家族间和跨RNA类型预测中增强了通用性。此外,我们开发了一个配备广泛使用的RSS预测算法和分析工具的在线网络服务器,为RNA研究社区提供了一个可访问的平台。本研究展示了通过对从基于热力学和DL的算法中学到的碱基配对线索进行高效集成学习来提高RSS预测通用性的新机会。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/390d/12388828/e36a1ad518e3/molecules-30-03447-g001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验