Yu Junlei, Gao Wenjia, Chen Siqi, Lu Ronglin, Qiao Jianbo, Jin Junru, Wei Leyi, Shi Hua, Zhang Zilong, Cui Feifei, Jiang Xinbo, Yan Zhongmin
Joint SDU-NTU Centre for Artificial Intelligence Research, Shandong University, 1500 Shunhua Road, High-Tech Industrial Development Zone, Jinan, Shandong 250101, China.
Centre for Artificial Intelligence driven Drug Discovery, Faculty of Applied Science, Macao Polytechnic University, Rua de Luís Gonzaga Gomes, Macao SAR, China.
Brief Bioinform. 2025 Aug 31;26(5). doi: 10.1093/bib/bbaf447.
Accurate identification of N7-methylguanosine (m7G) modification sites plays a critical role in uncovering the regulatory mechanisms of various biological processes, including human development, tumor initiation, and progression. However, existing prediction methods still suffer from limited representational power, redundant feature fusion, insufficient utilization of biological prior knowledge, and poor interpretability. In this study, we propose a novel deep learning model named MCAMEF-BERT. This model adopts a parallel architecture that integrates both a DNABERT-2-based pretrained model branch and multiple traditional feature encoding branches, enabling comprehensive multi-perspective sequence feature extraction. To address the redundancy issue in feature fusion, we introduce a multi-channel attention module. Our model demonstrates superior accuracy and effectiveness on datasets from m7GHub, outperforming other state-of-the-art classifiers. Furthermore, we validate the interpretability of MCAMEF-BERT through in silico saturation mutagenesis experiments, and confirm its robustness in motif recognition. Moreover, its generalization capability is validated across diverse RNA modification site prediction tasks.
准确识别N7-甲基鸟苷(m7G)修饰位点对于揭示包括人类发育、肿瘤发生和进展在内的各种生物过程的调控机制起着关键作用。然而,现有的预测方法仍然存在表征能力有限、特征融合冗余、生物先验知识利用不足以及可解释性差等问题。在本研究中,我们提出了一种名为MCAMEF-BERT的新型深度学习模型。该模型采用并行架构,集成了基于DNABERT-2的预训练模型分支和多个传统特征编码分支,能够进行全面的多视角序列特征提取。为了解决特征融合中的冗余问题,我们引入了多通道注意力模块。我们的模型在来自m7GHub的数据集上表现出卓越的准确性和有效性,优于其他现有最先进的分类器。此外,我们通过计算机模拟饱和诱变实验验证了MCAMEF-BERT的可解释性,并证实了其在基序识别中的稳健性。此外,其泛化能力在各种RNA修饰位点预测任务中得到了验证。