Tayara Hilal, Chong Kil To
IEEE/ACM Trans Comput Biol Bioinform. 2021 Nov-Dec;18(6):2526-2534. doi: 10.1109/TCBB.2020.2981335. Epub 2021 Dec 8.
RNA-binding proteins (RBPs) have a significant role in various regulatory tasks. However, the mechanism by which RBPs identify the subsequence target RNAs is still not clear. In recent years, several machine and deep learning-based computational models have been proposed for understanding the binding preferences of RBPs. These methods required integrating multiple features with raw RNA sequences such as secondary structure and their performances can be further improved. In this paper, we propose an efficient and simple convolution neural network, RBPCNN, that relies on the combination of the raw RNA sequence and evolutionary information. We show that conservation scores (evolutionary information) for the RNA sequences can significantly improve the overall performance of the proposed predictor. In addition, the automatic extraction of the binding sequence motifs can enhance our understanding of the binding specificities of RBPs. The experimental results show that RBPCNN outperforms significantly the current state-of-the-art methods. More specifically, the average area under the receiver operator curve was improved by 2.67 percent and the mean average precision was improved by 8.03 percent. The datasets and results can be downloaded from https://home.jbnu.ac.kr/NSCL/RBPCNN.htm.
RNA结合蛋白(RBPs)在各种调控任务中发挥着重要作用。然而,RBPs识别亚序列靶RNA的机制仍不清楚。近年来,为了理解RBPs的结合偏好,人们提出了几种基于机器学习和深度学习的计算模型。这些方法需要将多个特征与原始RNA序列(如二级结构)相结合,并且它们的性能可以进一步提高。在本文中,我们提出了一种高效且简单的卷积神经网络RBPCNN,它依赖于原始RNA序列和进化信息的结合。我们表明,RNA序列的保守得分(进化信息)可以显著提高所提出预测器的整体性能。此外,结合序列基序的自动提取可以增强我们对RBPs结合特异性的理解。实验结果表明,RBPCNN明显优于当前的最先进方法。更具体地说,受试者工作特征曲线下的平均面积提高了2.67%,平均平均精度提高了8.03%。数据集和结果可从https://home.jbnu.ac.kr/NSCL/RBPCNN.htm下载。