Wang Yongtian, Shen Yewei, Li Jiahao, Wang Tao, Peng Jiajie, Shang Xuequn
School of Computer Science, Northwestern Polytechnical University, 1 Dongxiang Rd, Xi'an 710129, China.
Shenzhen Research Institute of Northwestern Polytechnical University, Sanhang Science & Technology Building, No. 45th, Gaoxin South 9th Road, Nanshan District, Shenzhen City 518057, China.
Nucleic Acids Res. 2025 Jun 6;53(11). doi: 10.1093/nar/gkaf533.
Analyzing RNA secondary structures plays a crucial role in elucidating the functional mechanisms of RNA. Despite advances in RNA structure determination, these methods are low throughout and resource-intensive. While machine learning-based models have achieved remarkable performance in terms of prediction accuracy, challenges such as data scarcity and overfitting remain common. Here, we introduce a phased learning strategy that integrates RNA sequence and structural context information to mitigate the risk of overfitting and employs pairing constraints to train the model on folding scores. This approach effectively addresses both local and long-range nucleotide interactions, substantially improving the robustness of RNA secondary structure predictions. Our comprehensive analysis across multiple benchmarking datasets demonstrated that the performance of our model (DSRNAFold) was superior to that of existing methods, especially in pseudoknot recognition and chemical mapping activity prediction, where our approach showed positive performance.
分析RNA二级结构在阐明RNA的功能机制中起着至关重要的作用。尽管RNA结构测定取得了进展,但这些方法通量低且资源密集。虽然基于机器学习的模型在预测准确性方面取得了显著性能,但数据稀缺和过拟合等挑战仍然很常见。在此,我们引入一种分阶段学习策略,该策略整合RNA序列和结构上下文信息以降低过拟合风险,并采用配对约束在折叠分数上训练模型。这种方法有效地解决了局部和长程核苷酸相互作用问题,大幅提高了RNA二级结构预测的稳健性。我们对多个基准数据集的综合分析表明,我们的模型(DSRNAFold)的性能优于现有方法,特别是在假结识别和化学映射活性预测方面,我们的方法表现出良好性能。