Department of Biochemistry, The George S. Wise Faculty of Life Sciences, Tel Aviv University, Ramat Aviv 69978, Israel.
Bioinformatics. 2010 Mar 1;26(5):692-3. doi: 10.1093/bioinformatics/btq019. Epub 2010 Jan 19.
The iDBPs server uses the three-dimensional (3D) structure of a query protein to predict whether it binds DNA. First, the algorithm predicts the functional region of the protein based on its evolutionary profile; the assumption is that large clusters of conserved residues are good markers of functional regions. Next, various characteristics of the predicted functional region as well as global features of the protein are calculated, such as the average surface electrostatic potential, the dipole moment and cluster-based amino acid conservation patterns. Finally, a random forests classifier is used to predict whether the query protein is likely to bind DNA and to estimate the prediction confidence. We have trained and tested the classifier on various datasets and shown that it outperformed related methods. On a dataset that reflects the fraction of DNA binding proteins (DBPs) in a proteome, the area under the ROC curve was 0.90. The application of the server to an updated version of the N-Func database, which contains proteins of unknown function with solved 3D-structure, suggested new putative DBPs for experimental studies.
iDBPs 服务器使用查询蛋白质的三维(3D)结构来预测其是否与 DNA 结合。首先,该算法根据蛋白质的进化概况预测其功能区域;假设大的保守残基簇是功能区域的良好标记。接下来,计算预测功能区域的各种特征以及蛋白质的全局特征,例如平均表面静电势、偶极矩和基于聚类的氨基酸保守模式。最后,使用随机森林分类器来预测查询蛋白质是否可能与 DNA 结合,并估计预测置信度。我们已经在各种数据集上对分类器进行了训练和测试,并表明它优于相关方法。在反映蛋白质组中 DNA 结合蛋白(DBP)比例的数据集上,ROC 曲线下的面积为 0.90。该服务器在包含具有已解决 3D 结构的未知功能的蛋白质的 N-Func 数据库的更新版本上的应用,为实验研究提出了新的潜在 DBP。