Yan Changhui, Honavar Vasant, Dobbs Drena
Artificial Intelligence Research Laboratory, Iowa State University, Atanasoff Hall 226, Ames, IA 50011-1040, USA.
Neural Comput Appl. 2004 Jun 1;13(2):123-129. doi: 10.1007/s00521-004-0414-3.
In this paper, we describe a machine learning approach for sequence-based prediction of protein-protein interaction sites. A support vector machine (SVM) classifier was trained to predict whether or not a surface residue is an interface residue (i.e., is located in the protein-protein interaction surface), based on the identity of the target residue and its ten sequence neighbors. Separate classifiers were trained on proteins from two categories of complexes, antibody-antigen and protease-inhibitor. The effectiveness of each classifier was evaluated using leave-one-out (jack-knife) cross-validation. Interface and non-interface residues were classified with relatively high sensitivity (82.3% and 78.5%) and specificity (81.0% and 77.6%) for proteins in the antigen-antibody and protease-inhibitor complexes, respectively. The correlation between predicted and actual labels was 0.430 and 0.462, indicating that the method performs substantially better than chance (zero correlation). Combined with recently developed methods for identification of surface residues from sequence information, this offers a promising approach to predict residues involved in protein-protein interactions from sequence information alone.
在本文中,我们描述了一种基于序列预测蛋白质-蛋白质相互作用位点的机器学习方法。训练了一个支持向量机(SVM)分类器,以根据目标残基及其十个序列邻域的同一性来预测表面残基是否为界面残基(即位于蛋白质-蛋白质相互作用表面)。针对来自抗体-抗原和蛋白酶-抑制剂两类复合物的蛋白质分别训练了分类器。使用留一法(刀切法)交叉验证评估每个分类器的有效性。对于抗原-抗体和蛋白酶-抑制剂复合物中的蛋白质,界面残基和非界面残基的分类分别具有相对较高的灵敏度(82.3%和78.5%)和特异性(81.0%和77.6%)。预测标签与实际标签之间的相关性分别为0.430和0.462,表明该方法的性能明显优于随机猜测(零相关性)。结合最近开发的从序列信息中识别表面残基的方法,这为仅从序列信息预测参与蛋白质-蛋白质相互作用的残基提供了一种有前景的方法。