Bordner Andrew J, Abagyan Ruben
Molsoft LLC, San Diego, California, USA.
Proteins. 2005 Aug 15;60(3):353-66. doi: 10.1002/prot.20433.
Predicting protein-protein interfaces from a three-dimensional structure is a key task of computational structural proteomics. In contrast to geometrically distinct small molecule binding sites, protein-protein interface are notoriously difficult to predict. We generated a large nonredundant data set of 1494 true protein-protein interfaces using biological symmetry annotation where necessary. The data set was carefully analyzed and a Support Vector Machine was trained on a combination of a new robust evolutionary conservation signal with the local surface properties to predict protein-protein interfaces. Fivefold cross validation verifies the high sensitivity and selectivity of the model. As much as 97% of the predicted patches had an overlap with the true interface patch while only 22% of the surface residues were included in an average predicted patch. The model allowed the identification of potential new interfaces and the correction of mislabeled oligomeric states.
从三维结构预测蛋白质-蛋白质相互作用界面是计算结构蛋白质组学的一项关键任务。与几何形状不同的小分子结合位点相比,蛋白质-蛋白质相互作用界面极难预测。我们利用必要的生物学对称性注释生成了一个包含1494个真实蛋白质-蛋白质相互作用界面的大型非冗余数据集。对该数据集进行了仔细分析,并使用一种新的稳健进化保守信号与局部表面性质的组合训练了支持向量机,以预测蛋白质-蛋白质相互作用界面。五重交叉验证证实了该模型具有高灵敏度和选择性。高达97%的预测片段与真实界面片段有重叠,而平均每个预测片段仅包含22%的表面残基。该模型能够识别潜在的新界面并校正错误标注的寡聚状态。