College of Computer and Information Engineering, Henan Normal University, Xinxiang 453007, China; Key Laboratory of Artificial Intelligence and Personalized Learning in Education of Henan Province, College of Computer and Information Engineering, Henan Normal University, Xinxiang 453007, China.
College of Computer and Information Engineering, Henan Normal University, Xinxiang 453007, China.
Int J Biol Macromol. 2024 Oct;278(Pt 2):134805. doi: 10.1016/j.ijbiomac.2024.134805. Epub 2024 Aug 15.
CircRNAs play vital roles in biological system mainly through binding RNA-binding protein (RBP), which is essential for regulating physiological processes in vivo and for identifying causal disease variants. Therefore, predicting interactions between circRNA and RBP is a critical step for the discovery of new therapeutic agents. Application of various deep-learning models in bioinformatics has significantly improved prediction and classification performance. However, most of existing prediction models are only applicable to specific type of RNA or RNA with simple characteristics. In this study, we proposed an attractive deep learning model, MSTCRB, based on transformer and attention mechanism for extracting multi-scale features to predict circRNA-RBP interactions. Therein, K-mer and KNF encoding are employed to capture the global sequence features of circRNA, NCP and DPCP encoding are utilized to extract local sequence features, and the CDPfold method is applied to extract structural features. In order to improve prediction performance, optimized transformer framework and attention mechanism were used to integrate these multi-scale features. We compared our model's performance with other five state-of-the-art methods on 37 circRNA datasets and 31 linear RNA datasets. The results show that the average AUC value of MSTCRB reaches 98.45 %, which is better than other comparative methods. All of above datasets are deposited in https://github.com/chy001228/MSTCRB_database.git and source code are available from https://github.com/chy001228/MSTCRB.git.
CircRNAs 在生物系统中发挥着重要作用,主要通过与 RNA 结合蛋白 (RBP) 结合来实现,这对于调节体内的生理过程和识别致病的变异体至关重要。因此,预测 circRNA 与 RBP 之间的相互作用是发现新治疗剂的关键步骤。各种深度学习模型在生物信息学中的应用显著提高了预测和分类性能。然而,现有的大多数预测模型仅适用于特定类型的 RNA 或具有简单特征的 RNA。在本研究中,我们提出了一种基于转换器和注意力机制的有吸引力的深度学习模型 MSTCRB,用于提取多尺度特征以预测 circRNA-RBP 相互作用。其中,采用 K-mer 和 KNF 编码来捕获 circRNA 的全局序列特征,采用 NCP 和 DPCP 编码来提取局部序列特征,并应用 CDPfold 方法来提取结构特征。为了提高预测性能,我们使用优化的转换器框架和注意力机制来整合这些多尺度特征。我们在 37 个 circRNA 数据集和 31 个线性 RNA 数据集上与其他五种最先进的方法进行了性能比较。结果表明,MSTCRB 的平均 AUC 值达到 98.45%,优于其他比较方法。所有上述数据集都存储在 https://github.com/chy001228/MSTCRB_database.git 中,源代码可从 https://github.com/chy001228/MSTCRB.git 获得。