Fu Jianqi, Li Haohao, Kang Yanlei, Zhu Hancan, Huang Tiren, Li Zhong
School of Information Engineering, Huzhou University, Huzhou 313000, China.
College of Science, Zhejiang Sci-Tech University, Hangzhou 310018, China.
Genes (Basel). 2025 Feb 26;16(3):284. doi: 10.3390/genes16030284.
RNA research is critical for understanding gene regulation, disease mechanisms, and therapeutic development. Constructing effective RNA benchmark models for accurate downstream analysis has become a significant research challenge. The objective of this study is to propose a robust benchmark model, DRFormer, for RNA sequence downstream tasks. The DRFormer model utilizes RNA sequences to construct novel vision features based on secondary structure and sequence distance. These features are pre-trained using the SWIN model to develop a SWIN-RNA submodel. This submodel is then integrated with an RNA sequence model to construct a multimodal model for downstream analysis. : We conducted experiments on various RNA downstream tasks. In the sequence classification task, the MCC reached 94.4%, surpassing the state-of-the-art RNAErnie model by 1.2%. In the protein-RNA interaction prediction, DRFormer achieved an MCC of 0.492, outperforming advanced models like BERT-RBP and PrismNet. In RNA secondary structure prediction, the F1 score was 0.690, exceeding the widely used SPOT-RNA model by 1%. Additionally, generalization experiments on DNA tasks yielded satisfactory results. DRFormer is the first RNA sequence downstream analysis model that leverages structural features to construct a vision model and integrates sequence and vision models in a multimodal manner. This approach yields excellent prediction and analysis results, making it a valuable contribution to RNA research.
RNA研究对于理解基因调控、疾病机制和治疗发展至关重要。构建有效的RNA基准模型以进行准确的下游分析已成为一项重大研究挑战。本研究的目的是为RNA序列下游任务提出一个强大的基准模型DRFormer。DRFormer模型利用RNA序列基于二级结构和序列距离构建新颖的视觉特征。这些特征使用SWIN模型进行预训练以开发一个SWIN-RNA子模型。然后将该子模型与RNA序列模型集成以构建用于下游分析的多模态模型。我们在各种RNA下游任务上进行了实验。在序列分类任务中,马修斯相关系数(MCC)达到94.4%,比最先进的RNAErnie模型高出1.2%。在蛋白质-RNA相互作用预测中,DRFormer的MCC为0.492,优于BERT-RBP和PrismNet等先进模型。在RNA二级结构预测中,F1分数为0.690,比广泛使用的SPOT-RNA模型高出1%。此外,在DNA任务上的泛化实验也产生了令人满意的结果。DRFormer是第一个利用结构特征构建视觉模型并以多模态方式整合序列和视觉模型的RNA序列下游分析模型。这种方法产生了出色的预测和分析结果,使其成为RNA研究的一项有价值的贡献。