Bioinformatics Centre, Institute of Microbial Technology (IMTECH), Chandigarh, India.
BMC Bioinformatics. 2010 Jun 3;11:301. doi: 10.1186/1471-2105-11-301.
Guanosine triphosphate (GTP)-binding proteins play an important role in regulation of G-protein. Thus prediction of GTP interacting residues in a protein is one of the major challenges in the field of the computational biology. In this study, an attempt has been made to develop a computational method for predicting GTP interacting residues in a protein with high accuracy (Acc), precision (Prec) and recall (Rc).
All the models developed in this study have been trained and tested on a non-redundant (40% similarity) dataset using five-fold cross-validation. Firstly, we have developed neural network based models using single sequence and PSSM profile and achieved maximum Matthews Correlation Coefficient (MCC) 0.24 (Acc 61.30%) and 0.39 (Acc 68.88%) respectively. Secondly, we have developed a support vector machine (SVM) based models using single sequence and PSSM profile and achieved maximum MCC 0.37 (Prec 0.73, Rc 0.57, Acc 67.98%) and 0.55 (Prec 0.80, Rc 0.73, Acc 77.17%) respectively. In this work, we have introduced a new concept of predicting GTP interacting dipeptide (two consecutive GTP interacting residues) and tripeptide (three consecutive GTP interacting residues) for the first time. We have developed SVM based model for predicting GTP interacting dipeptides using PSSM profile and achieved MCC 0.64 with precision 0.87, recall 0.74 and accuracy 81.37%. Similarly, SVM based model have been developed for predicting GTP interacting tripeptides using PSSM profile and achieved MCC 0.70 with precision 0.93, recall 0.73 and accuracy 83.98%.
These results show that PSSM based method performs better than single sequence based method. The prediction models based on dipeptides or tripeptides are more accurate than the traditional model based on single residue. A web server "GTPBinder" http://www.imtech.res.in/raghava/gtpbinder/ based on above models has been developed for predicting GTP interacting residues in a protein.
三磷酸鸟苷(GTP)结合蛋白在 G 蛋白调节中发挥重要作用。因此,预测蛋白质中的 GTP 相互作用残基是计算生物学领域的主要挑战之一。在这项研究中,我们试图开发一种具有高精度(Acc)、高精准度(Prec)和高召回率(Rc)的预测蛋白质中 GTP 相互作用残基的计算方法。
本研究中开发的所有模型均使用五重交叉验证在非冗余(相似度 40%)数据集上进行了训练和测试。首先,我们使用单序列和 PSSM 图谱开发了基于神经网络的模型,分别获得了最大马修斯相关系数(MCC)0.24(Acc 61.30%)和 0.39(Acc 68.88%)。其次,我们使用单序列和 PSSM 图谱开发了基于支持向量机(SVM)的模型,分别获得了最大 MCC 0.37(Prec 0.73、Rc 0.57、Acc 67.98%)和 0.55(Prec 0.80、Rc 0.73、Acc 77.17%)。在这项工作中,我们首次引入了预测 GTP 相互作用二肽(两个连续的 GTP 相互作用残基)和三肽(三个连续的 GTP 相互作用残基)的新概念。我们使用 PSSM 图谱开发了基于 SVM 的预测 GTP 相互作用二肽的模型,MCC 为 0.64,精度为 0.87,召回率为 0.74,准确率为 81.37%。类似地,我们还开发了基于 SVM 的预测 GTP 相互作用三肽的模型,MCC 为 0.70,精度为 0.93,召回率为 0.73,准确率为 83.98%。
这些结果表明,基于 PSSM 的方法比基于单序列的方法表现更好。基于二肽或三肽的预测模型比传统的基于单个残基的模型更准确。我们开发了一个基于上述模型的网络服务器“GTPBinder”(http://www.imtech.res.in/raghava/gtpbinder/),用于预测蛋白质中的 GTP 相互作用残基。