IEEE/ACM Trans Comput Biol Bioinform. 2022 Jul-Aug;19(4):2409-2419. doi: 10.1109/TCBB.2021.3083930. Epub 2022 Aug 8.
RNA binding protein (RBP) is extensively involved in various cellular regulatory processes through the interaction with RNAs. Capturing the RBP binding preferences is fundamental for revealing the pathogenesis of complex diseases. Many experimental detection techniques are still time-consuming and labor-intensive, therefore, it is indispensable to develop a computational method with convincing accuracy. In this study, we proposed a CNN-BLSTM hybrid deep learning framework, named DeepDW, for predicting the RBP binding sites on RNAs with high-order encoding features of RNA sequence and secondary structure. The high-order encoding strategy was used to characterize the dependencies among adjacency nucleotides. For CNN-BLSTM hybrid model, DeepDW first employed two 1-D convolutional neural networks (CNNs) for learning the local features from high-order encoded matrices of RNA sequence and structure separately, and then applied two bidirectional long short-term memory networks (BLSTMs) to capture the global information in a higher level. Moreover, a series of experiments were carried out on 31 public datasets to evaluate our proposed framework, and DeepDW achieved superior performance than the state-of-the-art methods. The results indicated that the combination of high-order encoding method and CNN-BLSTM hybrid model had advantages in identifying RBP-RNA binding sites.
RNA 结合蛋白(RBP)通过与 RNA 的相互作用广泛参与各种细胞调控过程。捕获 RBP 结合偏好对于揭示复杂疾病的发病机制至关重要。许多实验检测技术仍然耗时耗力,因此,开发一种具有令人信服准确性的计算方法是必不可少的。在这项研究中,我们提出了一种 CNN-BLSTM 混合深度学习框架,称为 DeepDW,用于预测 RNA 上的 RBP 结合位点,该框架具有 RNA 序列和二级结构的高阶编码特征。高阶编码策略用于描述相邻核苷酸之间的依赖关系。对于 CNN-BLSTM 混合模型,DeepDW 首先使用两个 1-D 卷积神经网络(CNNs)分别从 RNA 序列和结构的高阶编码矩阵中学习局部特征,然后应用两个双向长短期记忆网络(BLSTMs)在更高层次上捕获全局信息。此外,在 31 个公共数据集上进行了一系列实验来评估我们提出的框架,DeepDW 优于最先进的方法。结果表明,高阶编码方法和 CNN-BLSTM 混合模型的结合在识别 RBP-RNA 结合位点方面具有优势。