Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China.
Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, Zhejiang, China.
PLoS Comput Biol. 2022 Jan 20;18(1):e1009798. doi: 10.1371/journal.pcbi.1009798. eCollection 2022 Jan.
Circular RNAs (circRNAs) are non-coding RNAs with a special circular structure produced formed by the reverse splicing mechanism. Increasing evidence shows that circular RNAs can directly bind to RNA-binding proteins (RBP) and play an important role in a variety of biological activities. The interactions between circRNAs and RBPs are key to comprehending the mechanism of posttranscriptional regulation. Accurately identifying binding sites is very useful for analyzing interactions. In past research, some predictors on the basis of machine learning (ML) have been presented, but prediction accuracy still needs to be ameliorated. Therefore, we present a novel calculation model, CRBPDL, which uses an Adaboost integrated deep hierarchical network to identify the binding sites of circular RNA-RBP. CRBPDL combines five different feature encoding schemes to encode the original RNA sequence, uses deep multiscale residual networks (MSRN) and bidirectional gating recurrent units (BiGRUs) to effectively learn high-level feature representations, it is sufficient to extract local and global context information at the same time. Additionally, a self-attention mechanism is employed to train the robustness of the CRBPDL. Ultimately, the Adaboost algorithm is applied to integrate deep learning (DL) model to improve prediction performance and reliability of the model. To verify the usefulness of CRBPDL, we compared the efficiency with state-of-the-art methods on 37 circular RNA data sets and 31 linear RNA data sets. Moreover, results display that CRBPDL is capable of performing universal, reliable, and robust. The code and data sets are obtainable at https://github.com/nmt315320/CRBPDL.git.
环形 RNA(circRNAs)是一类具有特殊环形结构的非编码 RNA,由反向剪接机制产生。越来越多的证据表明,circRNAs 可以直接与 RNA 结合蛋白(RBP)结合,并在多种生物活性中发挥重要作用。circRNAs 与 RBPs 的相互作用是理解转录后调控机制的关键。准确识别结合位点对于分析相互作用非常有用。在过去的研究中,已经提出了一些基于机器学习(ML)的预测器,但预测准确性仍有待提高。因此,我们提出了一种新的计算模型 CRBPDL,该模型使用 Adaboost 集成深度层次网络来识别环状 RNA-RBP 的结合位点。CRBPDL 结合了五种不同的特征编码方案对原始 RNA 序列进行编码,使用深度多尺度残差网络(MSRN)和双向门控循环单元(BiGRUs)来有效地学习高级特征表示,同时足以提取局部和全局上下文信息。此外,还采用了自注意力机制来训练 CRBPDL 的鲁棒性。最终,应用 Adaboost 算法集成深度学习(DL)模型来提高模型的预测性能和可靠性。为了验证 CRBPDL 的有效性,我们在 37 个环状 RNA 数据集和 31 个线性 RNA 数据集上与最新方法进行了效率比较。此外,结果表明 CRBPDL 具有通用性、可靠性和鲁棒性。代码和数据集可在 https://github.com/nmt315320/CRBPDL.git 上获得。