深度MC-iNABP：用于核酸结合蛋白多类识别和分类的深度学习

DeepMC-iNABP: Deep learning for multiclass identification and classification of nucleic acid-binding proteins.

作者信息

Cui Feifei, Li Shuang, Zhang Zilong, Sui Miaomiao, Cao Chen, El-Latif Hesham Abd, Zou Quan

机构信息

School of Computer Science and Technology, Hainan University, Haikou 570228, China.

Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu 610054, China.

出版信息

Comput Struct Biotechnol J. 2022 Apr 26;20:2020-2028. doi: 10.1016/j.csbj.2022.04.029. eCollection 2022.

Nucleic acid-binding proteins (NABPs), including DNA-binding proteins (DBPs) and RNA-binding proteins (RBPs), play vital roles in gene expression. Accurate identification of these proteins is crucial. However, there are two existing challenges: one is the problem of ignoring DNA- and RNA-binding proteins (DRBPs), and the other is a cross-predicting problem referring to DBP predictors predicting DBPs as RBPs, and vice versa. In this study, we proposed a computational predictor, called DeepMC-iNABP, with the goal of solving these difficulties by utilizing a multiclass classification strategy and deep learning approaches. DBPs, RBPs, DRBPs and non-NABPs as separate classes of data were used for training the DeepMC-iNABP model. The results on test data collected in this study and two independent test datasets showed that DeepMC-iNABP has a strong advantage in identifying the DRBPs and has the ability to alleviate the cross-prediction problem to a certain extent. The web-server of DeepMC-iNABP is freely available at http://www.deepmc-inabp.net/. The datasets used in this research can also be downloaded from the website.

核酸结合蛋白（NABP），包括DNA结合蛋白（DBP）和RNA结合蛋白（RBP），在基因表达中发挥着至关重要的作用。准确识别这些蛋白至关重要。然而，目前存在两个挑战：一是忽略DNA和RNA结合蛋白（DRBP）的问题，另一个是交叉预测问题，即DBP预测器将DBP预测为RBP，反之亦然。在本研究中，我们提出了一种计算预测器，称为DeepMC-iNABP，旨在通过利用多类分类策略和深度学习方法来解决这些难题。将DBP、RBP、DRBP和非NABP作为单独的数据类别用于训练DeepMC-iNABP模型。本研究收集的测试数据以及两个独立测试数据集的结果表明，DeepMC-iNABP在识别DRBP方面具有强大优势，并且能够在一定程度上缓解交叉预测问题。DeepMC-iNABP的网络服务器可在http://www.deepmc-inabp.net/免费获取。本研究中使用的数据集也可从该网站下载。