Kufareva Irina, Budagyan Levon, Raush Eugene, Totrov Maxim, Abagyan Ruben
Scripps Research Institute, La Jolla, California 92037, USA.
Proteins. 2007 May 1;67(2):400-17. doi: 10.1002/prot.21233.
Recent advances in structural proteomics call for development of fast and reliable automatic methods for prediction of functional surfaces of proteins with known three-dimensional structure, including binding sites for known and unknown protein partners as well as oligomerization interfaces. Despite significant progress the problem is still far from being solved. Most existing methods rely, at least partially, on evolutionary information from multiple sequence alignments projected on protein surface. The common drawback of such methods is their limited applicability to the proteins with a sparse set of sequential homologs, as well as inability to detect interfaces in evolutionary variable regions. In this study, the authors developed an improved method for predicting interfaces from a single protein structure, which is based on local statistical properties of the protein surface derived at the level of atomic groups. The proposed Protein IntErface Recognition (PIER) method achieved the overall precision of 60% at the recall threshold of 50% at the residue level on a diverse benchmark of 490 homodimeric, 62 heterodimeric, and 196 transient interfaces (compared with 25% precision at 50% recall expected from random residue function assignment). For 70% of proteins in the benchmark, the binding patch residues were successfully detected with precision exceeding 50% at 50% recall. The calculation only took seconds for an average 300-residue protein. The authors demonstrated that adding the evolutionary conservation signal only marginally influenced the overall prediction performance on the benchmark; moreover, for certain classes of proteins, using this signal actually resulted in a deteriorated prediction. Thorough benchmarking using other datasets from literature showed that PIER yielded improved performance as compared with several alignment-free or alignment-dependent predictions. The accuracy, efficiency, and dependence on structure alone make PIER a suitable tool for automated high-throughput annotation of protein structures emerging from structural proteomics projects.
结构蛋白质组学的最新进展要求开发快速且可靠的自动方法,用于预测具有已知三维结构的蛋白质的功能表面,包括已知和未知蛋白质伴侣的结合位点以及寡聚化界面。尽管取得了重大进展,但该问题仍远未得到解决。大多数现有方法至少部分依赖于从投射到蛋白质表面的多序列比对中获得的进化信息。此类方法的共同缺点是它们对具有稀疏序列同源物集的蛋白质适用性有限,并且无法检测进化可变区域中的界面。在本研究中,作者开发了一种改进的方法,用于从单个蛋白质结构预测界面,该方法基于在原子基团水平上推导的蛋白质表面的局部统计特性。所提出的蛋白质界面识别(PIER)方法在490个同二聚体、62个异二聚体和196个瞬时界面的多样化基准上,在残基水平的召回阈值为50%时,总体精度达到60%(相比随机残基功能分配在50%召回率时预期的25%精度)。对于基准中的70%的蛋白质,在50%召回率时成功检测到结合补丁残基,精度超过50%。对于平均300个残基的蛋白质,计算仅需几秒钟。作者证明,添加进化保守信号仅对基准上的总体预测性能产生轻微影响;此外,对于某些类别的蛋白质,使用此信号实际上会导致预测性能下降。使用文献中的其他数据集进行的全面基准测试表明,与几种无比对或依赖比对的预测相比,PIER的性能有所提高。PIER的准确性、效率和仅对结构的依赖性使其成为用于对结构蛋白质组学项目中出现的蛋白质结构进行自动高通量注释的合适工具。