IEEE J Biomed Health Inform. 2021 Sep;25(9):3668-3676. doi: 10.1109/JBHI.2021.3069259. Epub 2021 Sep 3.
RNA-binding protein (RBP) is a powerful and wide-ranging regulator that plays an important role in cell development, differentiation, metabolism, health and disease. The prediction of RBPs provides valuable guidance for biologists. Although experimental methods have made great progress in predicting RBP, they are time-consuming and not flexible. Therefore, we developed a network model, rBPDL, by combining a convolutional neural network and long short-term memory for multilabel classification of RBPs. Moreover, to achieve better prediction results, we used a voting algorithm for ensemble learning of the model. We compared rBPDL with state-of-the-art methods and found that rBPDL significantly improved identification performance for the RBP68 dataset, with a macro-Area Under Curve (AUC), micro-AUC, and weighted AUC of 0.936, 0.962, and 0.946, respectively. Furthermore, through AUC statistical analysis of the RBP domain, we analyzed the performance of rBPDL and found that the RBP identification performance in the same domain was similar. In addition, we analyzed the performance preferences and physicochemical properties of the binding protein amino acids and explored the characteristics that affect the binding by using the RBP86 dataset.
RNA 结合蛋白(RBP)是一种强大且广泛的调控因子,在细胞发育、分化、代谢、健康和疾病中发挥着重要作用。RBP 的预测为生物学家提供了有价值的指导。尽管实验方法在预测 RBP 方面取得了很大进展,但它们既耗时又不灵活。因此,我们开发了一种网络模型 rBPDL,该模型结合了卷积神经网络和长短期记忆,用于 RBP 的多标签分类。此外,为了获得更好的预测结果,我们使用投票算法对模型进行了集成学习。我们将 rBPDL 与最先进的方法进行了比较,发现 rBPDL 显著提高了 RBP68 数据集的识别性能,宏 AUC、微 AUC 和加权 AUC 分别为 0.936、0.962 和 0.946。此外,通过对 RBP 结构域的 AUC 统计分析,我们分析了 rBPDL 的性能,发现相同结构域中 RBP 的识别性能相似。此外,我们还分析了结合蛋白氨基酸的性能偏好和理化性质,并利用 RBP86 数据集探索了影响结合的特征。