BACTER Institute, University of Wisconsin-Madison, Madison, Wisconsin 53706, USA.
Proteins. 2011 Sep;79(9):2671-83. doi: 10.1002/prot.23094. Epub 2011 Jul 6.
Hot spots constitute a small fraction of protein-protein interface residues, yet they account for a large fraction of the binding affinity. Based on our previous method (KFC), we present two new methods (KFC2a and KFC2b) that outperform other methods at hot spot prediction. A number of improvements were made in developing these new methods. First, we created a training data set that contained a similar number of hot spot and non-hot spot residues. In addition, we generated 47 different features, and different numbers of features were used to train the models to avoid over-fitting. Finally, two feature combinations were selected: One (used in KFC2a) is composed of eight features that are mainly related to solvent accessible surface area and local plasticity; the other (KFC2b) is composed of seven features, only two of which are identical to those used in KFC2a. The two models were built using support vector machines (SVM). The two KFC2 models were then tested on a mixed independent test set, and compared with other methods such as Robetta, FOLDEF, HotPoint, MINERVA, and KFC. KFC2a showed the highest predictive accuracy for hot spot residues (True Positive Rate: TPR = 0.85); however, the false positive rate was somewhat higher than for other models. KFC2b showed the best predictive accuracy for hot spot residues (True Positive Rate: TPR = 0.62) among all methods other than KFC2a, and the False Positive Rate (FPR = 0.15) was comparable with other highly predictive methods.
热点在蛋白质-蛋白质界面残基中只占很小一部分,但它们在结合亲和力中却占很大一部分。基于我们之前的方法 (KFC),我们提出了两种新的方法 (KFC2a 和 KFC2b),它们在预测热点方面优于其他方法。在开发这些新方法时,我们做了一些改进。首先,我们创建了一个包含相似数量热点和非热点残基的训练数据集。此外,我们生成了 47 种不同的特征,并使用不同数量的特征来训练模型,以避免过度拟合。最后,选择了两种特征组合:一种(用于 KFC2a)由 8 个主要与溶剂可及表面积和局部可塑性相关的特征组成;另一种(KFC2b)由 7 个特征组成,其中只有两个与 KFC2a 中使用的特征相同。这两个模型都是使用支持向量机 (SVM) 构建的。然后,我们在一个混合独立测试集上测试了这两个 KFC2 模型,并与其他方法(如 Robetta、FOLDEF、HotPoint、MINERVA 和 KFC)进行了比较。KFC2a 对热点残基的预测准确率最高(真阳性率:TPR = 0.85);然而,假阳性率略高于其他模型。KFC2b 在所有除 KFC2a 之外的方法中对热点残基的预测准确率最高(真阳性率:TPR = 0.62),假阳性率(FPR = 0.15)与其他预测准确率高的方法相当。