Li Nan, Sun Zhonghua, Jiang Fan
Beijing National Laboratory for Condensed Matter Physics, Institute of Physics, Chinese Academy of Sciences, Beijing, PR China.
BMC Bioinformatics. 2008 Dec 22;9:553. doi: 10.1186/1471-2105-9-553.
The prediction of protein-protein binding site can provide structural annotation to the protein interaction data from proteomics studies. This is very important for the biological application of the protein interaction data that is increasing rapidly. Moreover, methods for predicting protein interaction sites can also provide crucial information for improving the speed and accuracy of protein docking methods.
In this work, we describe a binding site prediction method by designing a new residue neighbour profile and by selecting only the core-interface residues for SVM training. The residue neighbour profile includes both the sequential and the spatial neighbour residues of an interface residue, which is a more complete description of the physical and chemical characteristics surrounding the interface residue. The concept of core interface is applied in selecting the interface residues for training the SVM models, which is shown to result in better discrimination between the core interface and other residues. The best SVM model trained was tested on a test set of 50 randomly selected proteins. The sensitivity, specificity, and MCC for the prediction of the core interface residues were 60.6%, 53.4%, and 0.243, respectively. Our prediction results on this test set were compared with other three binding site prediction methods and found to perform better. Furthermore, our method was tested on the 101 unbound proteins from the protein-protein interaction benchmark v2.0. The sensitivity, specificity, and MCC of this test were 57.5%, 32.5%, and 0.168, respectively.
By improving both the descriptions of the interface residues and their surrounding environment and the training strategy, better SVM models were obtained and shown to outperform previous methods. Our tests on the unbound protein structures suggest further improvement is possible.
蛋白质-蛋白质结合位点的预测可为蛋白质组学研究中的蛋白质相互作用数据提供结构注释。这对于迅速增加的蛋白质相互作用数据的生物学应用非常重要。此外,预测蛋白质相互作用位点的方法还可为提高蛋白质对接方法的速度和准确性提供关键信息。
在这项工作中,我们描述了一种结合位点预测方法,该方法通过设计新的残基邻域概况并仅选择核心界面残基进行支持向量机(SVM)训练。残基邻域概况包括界面残基的序列和空间邻域残基,这是对界面残基周围物理和化学特征更完整的描述。核心界面的概念用于选择用于训练SVM模型的界面残基,结果表明这能更好地区分核心界面和其他残基。对训练得到的最佳SVM模型在50个随机选择的蛋白质测试集上进行测试。预测核心界面残基的灵敏度、特异性和马修斯相关系数(MCC)分别为60.6%、53.4%和0.243。我们在这个测试集上的预测结果与其他三种结合位点预测方法进行比较,发现表现更好。此外,我们的方法在蛋白质-蛋白质相互作用基准v2.0的101个未结合蛋白质上进行了测试。该测试的灵敏度、特异性和MCC分别为57.5%、32.5%和0.168。
通过改进界面残基及其周围环境的描述以及训练策略,获得了更好的SVM模型,且表现优于先前的方法。我们对未结合蛋白质结构的测试表明仍有进一步改进的可能。