School of Internet of Things Engineering, Jiangnan University, Wuxi, China.
School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing, China.
Mol Inform. 2020 Aug;39(8):e2000006. doi: 10.1002/minf.202000006. Epub 2020 Mar 23.
DNA-binding proteins play essential roles in many molecular functions and gene regulation. Therefore, it becomes highly desirable to develop effective computational techniques for detecting DNA-binding proteins. In this paper, we proposed a new method, iDBP-DEP, which performs DNA-binding prediction by using the discriminative feature derived from multi-view feature sources including evolutionary profile, dipeptide composition, and physicochemical properties with feature selection. We evaluated iDBP-DEP on two benchmark datasets, i. e., PDB1075 and PDB594 by rigorous Jackknife test. Compared with the state-of-the-art sequence-based DNA-binding predictors, the proposed iDBP-DEP achieved 1.8 % and 3.0 % improvements of accuracy (Acc) and Mathew's Correlation Coefficient (MCC), respectively, on PDB1075 dataset; 7.4 % and 14.8 % improvements of Acc and MCC, respectively, on PDB594. The independent validation test with PDB186 show that the proposed method achieved the best performances on Acc (80.1 %) and MCC (0.684), which further demonstrated the robustness of iDBP-DEP for the detection of DNA-binding proteins. Datasets and codes used in this study are freely available at https://githup.com/Zll-codeside/iDBP-DEP.
DNA 结合蛋白在许多分子功能和基因调控中发挥着重要作用。因此,开发有效的计算技术来检测 DNA 结合蛋白变得非常重要。在本文中,我们提出了一种新的方法 iDBP-DEP,该方法通过使用来自多视图特征源(包括进化轮廓、二肽组成和物理化学性质)的判别特征,并结合特征选择来进行 DNA 结合预测。我们通过严格的 Jackknife 测试在两个基准数据集 PDB1075 和 PDB594 上评估了 iDBP-DEP。与最先进的基于序列的 DNA 结合预测器相比,在 PDB1075 数据集上,我们提出的 iDBP-DEP 在准确性 (Acc) 和马修相关系数 (MCC) 方面分别提高了 1.8%和 3.0%;在 PDB594 数据集上,Acc 和 MCC 分别提高了 7.4%和 14.8%。使用 PDB186 进行的独立验证测试表明,该方法在 Acc(80.1%)和 MCC(0.684)方面取得了最佳性能,进一步证明了 iDBP-DEP 用于检测 DNA 结合蛋白的稳健性。本研究中使用的数据集和代码可在 https://githup.com/Zll-codeside/iDBP-DEP 上免费获取。