IEEE/ACM Trans Comput Biol Bioinform. 2020 May-Jun;17(3):972-980. doi: 10.1109/TCBB.2018.2874267. Epub 2018 Oct 5.
Emerging evidence has shown that RNA plays a crucial role in many cellular processes, and their biological functions are primarily achieved by binding with a variety of proteins. High-throughput biological experiments provide a lot of valuable information for the initial identification of RNA-protein interactions (RPIs), but with the increasing complexity of RPIs networks, this method gradually falls into expensive and time-consuming situations. Therefore, there is an urgent need for high speed and reliable methods to predict RNA-protein interactions. In this study, we propose a computational method for predicting the RNA-protein interactions using sequence information. The deep learning convolution neural network (CNN) algorithm is utilized to mine the hidden high-level discriminative features from the RNA and protein sequences and feed it into the extreme learning machine (ELM) classifier. The experimental results with 5-fold cross-validation indicate that the proposed method achieves superior performance on benchmark datasets (RPI1807, RPI2241, and RPI369) with the accuracy of 98.83, 90.83, and 85.63 percent, respectively. We further evaluate the performance of the proposed model by comparing it with the state-of-the-art SVM classifier and other existing methods on the same benchmark data set. In addition, we predicted the independent NPInter v2.0 data set using the model trained on RPI369. The experimental results show that our model can serve as a useful tool for predicting RNA-protein interactions.
新出现的证据表明,RNA 在许多细胞过程中起着至关重要的作用,其生物功能主要通过与各种蛋白质结合来实现。高通量的生物实验为初步鉴定 RNA-蛋白质相互作用 (RPIs) 提供了大量有价值的信息,但随着 RPIs 网络的日益复杂,这种方法逐渐变得昂贵和耗时。因此,迫切需要高速可靠的方法来预测 RNA-蛋白质相互作用。在这项研究中,我们提出了一种使用序列信息预测 RNA-蛋白质相互作用的计算方法。利用深度学习卷积神经网络 (CNN) 算法从 RNA 和蛋白质序列中挖掘隐藏的高级判别特征,并将其输入极限学习机 (ELM) 分类器。5 折交叉验证的实验结果表明,该方法在基准数据集 (RPI1807、RPI2241 和 RPI369) 上的性能优于其他方法,准确率分别为 98.83%、90.83%和 85.63%。我们通过将该模型与同一基准数据集上的 SVM 分类器和其他现有方法进行比较,进一步评估了该模型的性能。此外,我们还使用在 RPI369 上训练的模型对独立的 NPInter v2.0 数据集进行了预测。实验结果表明,我们的模型可以作为预测 RNA-蛋白质相互作用的有用工具。