‡Institute of Biomedical Sciences, Academia Sinica, Nankang, Taipei 115, Taiwan.
Mol Cell Proteomics. 2013 Mar;12(3):679-86. doi: 10.1074/mcp.M112.020198. Epub 2012 Dec 13.
The structures of protein complexes are increasingly predicted via protein-protein docking (PPD) using ambiguous interaction data to help guide the docking. These data often are incomplete and contain errors and therefore could lead to incorrect docking predictions. In this study, we performed a series of PPD simulations to examine the effects of incompletely and incorrectly assigned interface residues on the success rate of PPD predictions. The results for a widely used PPD benchmark dataset obtained using a new interface information-driven PPD (IPPD) method developed in this work showed that the success rate for an acceptable top-ranked model varied, depending on the information content used, from as high as 95% when contact relationships (though not contact distances) were known for all residues to 78% when only the interface/non-interface state of the residues was known. However, the success rates decreased rapidly to ∼40% when the interface/non-interface state of 20% of the residues was assigned incorrectly, and to less than 5% for a 40% incorrect assignment. Comparisons with results obtained by re-ranking a global search and with those reported for other data-guided PPD methods showed that, in general, IPPD performed better than re-ranking when the information used was more complete and more accurate, but worse when it was not, and that when using bioinformatics-predicted information on interface residues, IPPD and other data-guided PPD methods performed poorly, at a level similar to simulations with a 40% incorrect assignment. These results provide guidelines for using information about interface residues to improve PPD predictions and reveal a bottleneck for such improvement imposed by the low accuracy of current bioinformatic interface residue predictions.
蛋白质复合物的结构越来越多地通过使用模糊交互数据的蛋白质-蛋白质对接(PPD)进行预测,以帮助指导对接。这些数据通常是不完整的,并且包含错误,因此可能导致对接预测不正确。在这项研究中,我们进行了一系列 PPD 模拟,以检查接口残基分配不完整和不正确对 PPD 预测成功率的影响。使用我们在这项工作中开发的新的基于接口信息的 PPD(IPPD)方法对广泛使用的 PPD 基准数据集进行的结果表明,对于可接受的顶级模型的成功率会有所变化,具体取决于所使用的信息内容,从所有残基的接触关系(尽管不是接触距离)都已知时的高达 95%到仅知道残基的接口/非接口状态时的 78%不等。但是,当 20%的接口残基的接口/非接口状态分配不正确时,成功率迅速下降到约 40%,而当 40%的接口残基分配不正确时,成功率则降至 5%以下。与通过重新排序全局搜索获得的结果和其他数据指导的 PPD 方法报告的结果进行比较表明,通常情况下,当使用的信息更完整且更准确时,IPPD 比重新排序的效果更好,但当信息不完整且不准确时,IPPD 比重新排序的效果更差,并且当使用生物信息学预测的接口残基信息时,IPPD 和其他数据指导的 PPD 方法的性能较差,与模拟的成功率相似,在 40%的接口残基分配不正确的情况下。这些结果为使用有关接口残基的信息来改善 PPD 预测提供了指导,并揭示了当前生物信息学接口残基预测准确性较低对这种改善的瓶颈。