Department of Information and Communication Technology, Mawlana Bhashani Science and Technology University, Santosh, Tangail, 1902, Bangladesh.
Department of Computer Science, American International University-Bangladesh (AIUB), Kuratoli, Dhaka, 1229, Bangladesh.
Comput Biol Med. 2022 Jun;145:105433. doi: 10.1016/j.compbiomed.2022.105433. Epub 2022 Mar 30.
Accurate identification of DNA-binding proteins (DBPs) is critical for both understanding protein function and drug design. DBPs also play essential roles in different kinds of biological activities such as DNA replication, repair, transcription, and splicing. As experimental identification of DBPs is time-consuming and sometimes biased toward prediction, constructing an effective DBP model represents an urgent need, and computational methods that can accurately predict potential DBPs based on sequence information are highly desirable. In this paper, a novel predictor called DeepDNAbP has been developed to accurately predict DBPs from sequences using a convolutional neural network (CNN) model. First, we perform three feature extraction methods, namely position-specific scoring matrix (PSSM), pseudo-amino acid composition (PseAAC) and tripeptide composition (TPC), to represent protein sequence patterns. Secondly, SHapley Additive exPlanations (SHAP) are employed to remove the redundant and irrelevant features for predicting DBPs. Finally, the best features are provided to the CNN classifier to construct the DeepDNAbP model for identifying DBPs. The final DeepDNAbP predictor achieves superior prediction performance in K-fold cross-validation tests and outperforms other existing predictors of DNA-protein binding methods. DeepDNAbP is poised to be a powerful computational resource for the prediction of DBPs. The web application and curated datasets in this study are freely available at: http://deepdbp.sblog360.blog/.
准确识别 DNA 结合蛋白 (DBP) 对于理解蛋白质功能和药物设计都至关重要。DBP 还在不同的生物活性中发挥着重要作用,如 DNA 复制、修复、转录和剪接。由于实验鉴定 DBP 既耗时又有时偏向于预测,因此构建有效的 DBP 模型迫在眉睫,而能够基于序列信息准确预测潜在 DBP 的计算方法是非常需要的。在本文中,我们开发了一种名为 DeepDNAbP 的新型预测器,该预测器使用卷积神经网络 (CNN) 模型从序列中准确预测 DBP。首先,我们使用三种特征提取方法,即位置特异性评分矩阵 (PSSM)、伪氨基酸组成 (PseAAC) 和三肽组成 (TPC),来表示蛋白质序列模式。其次,我们采用 SHapley Additive exPlanations (SHAP) 来去除预测 DBP 时冗余和不相关的特征。最后,将最佳特征提供给 CNN 分类器,以构建用于识别 DBP 的 DeepDNAbP 模型。最终的 DeepDNAbP 预测器在 K 折交叉验证测试中表现出优异的预测性能,优于其他现有的 DNA-蛋白质结合方法预测器。DeepDNAbP 有望成为预测 DBP 的强大计算资源。本研究中的网络应用程序和经过整理的数据集可免费在:http://deepdbp.sblog360.blog/ 获取。