Chen Huiling, Zhou Huan-Xiang
Department of Physics, Drexel University, Philadelphia, Pennsylvania, USA.
Proteins. 2005 Oct 1;61(1):21-35. doi: 10.1002/prot.20514.
The number of structures of protein-protein complexes deposited to the Protein Data Bank is growing rapidly. These structures embed important information for predicting structures of new protein complexes. This motivated us to develop the PPISP method for predicting interface residues in protein-protein complexes. In PPISP, sequence profiles and solvent accessibility of spatially neighboring surface residues were used as input to a neural network. The network was trained on native interface residues collected from the Protein Data Bank. The prediction accuracy at the time was 70% with 47% coverage of native interface residues. Now we have extensively improved PPISP. The training set now consisted of 1156 nonhomologous protein chains. Test on a set of 100 nonhomologous protein chains showed that the prediction accuracy is now increased to 80% with 51% coverage. To solve the problem of over-prediction and under-prediction associated with individual neural network models, we developed a consensus method that combines predictions from multiple models with different levels of accuracy and coverage. Applied on a benchmark set of 68 proteins for protein-protein docking, the consensus approach outperformed the best individual models by 3-8 percentage points in accuracy. To demonstrate the predictive power of cons-PPISP, eight complex-forming proteins with interfaces characterized by NMR were tested. These proteins are nonhomologous to the training set and have a total of 144 interface residues identified by chemical shift perturbation. cons-PPISP predicted 174 interface residues with 69% accuracy and 47% coverage and promises to complement experimental techniques in characterizing protein-protein interfaces. .
存入蛋白质数据库的蛋白质 - 蛋白质复合物结构数量正在迅速增长。这些结构蕴含着预测新蛋白质复合物结构的重要信息。这促使我们开发了用于预测蛋白质 - 蛋白质复合物中界面残基的PPISP方法。在PPISP中,空间相邻表面残基的序列概况和溶剂可及性被用作神经网络的输入。该网络使用从蛋白质数据库收集的天然界面残基进行训练。当时的预测准确率为70%,天然界面残基的覆盖率为47%。现在我们对PPISP进行了大幅改进。训练集现在由1156条非同源蛋白质链组成。对一组100条非同源蛋白质链的测试表明,预测准确率现在提高到了80%,覆盖率为51%。为了解决与单个神经网络模型相关的过度预测和预测不足问题,我们开发了一种共识方法,该方法结合了来自多个具有不同准确率和覆盖率模型的预测。应用于一组用于蛋白质 - 蛋白质对接的68种蛋白质的基准集,共识方法在准确率上比最佳单个模型高出3 - 8个百分点。为了证明cons - PPISP的预测能力,测试了八种通过核磁共振表征界面的复合物形成蛋白质。这些蛋白质与训练集非同源,通过化学位移扰动共鉴定出144个界面残基。cons - PPISP预测了174个界面残基,准确率为69%,覆盖率为47%,有望在表征蛋白质 - 蛋白质界面方面补充实验技术。