Centre for Molecular Bioinformatics, Department of Biology, University of Rome Tor Vergata, Via della Ricerca Scientifica snc, Rome 00133, Italy.
BMC Bioinformatics. 2012 Mar 28;13 Suppl 4(Suppl 4):S17. doi: 10.1186/1471-2105-13-S4-S17.
The identification of ligand binding sites is a key task in the annotation of proteins with known structure but uncharacterized function. Here we describe a knowledge-based method exploiting the observation that unrelated binding sites share small structural motifs that bind the same chemical fragments irrespective of the nature of the ligand as a whole.
PDBinder compares a query protein against a library of binding and non-binding protein surface regions derived from the PDB. The results of the comparison are used to derive a propensity value for each residue which is correlated with the likelihood that the residue is part of a ligand binding site. The method was applied to two different problems: i) the prediction of ligand binding residues and ii) the identification of which surface cleft harbours the binding site. In both cases PDBinder performed consistently better than existing methods. PDBinder has been trained on a non-redundant set of 1356 high-quality protein-ligand complexes and tested on a set of 239 holo and apo complex pairs. We obtained an MCC of 0.313 on the holo set with a PPV of 0.413 while on the apo set we achieved an MCC of 0.271 and a PPV of 0.372.
We show that PDBinder performs better than existing methods. The good performance on the unbound proteins is extremely important for real-world applications where the location of the binding site is unknown. Moreover, since our approach is orthogonal to those used in other programs, the PDBinder propensity value can be integrated in other algorithms further increasing the final performance.
鉴定配体结合位点是注释具有已知结构但功能未知的蛋白质的关键任务。在这里,我们描述了一种基于知识的方法,该方法利用了这样一种观察结果,即不相关的结合位点共享结合相同化学片段的小结构基序,而不管配体的整体性质如何。
PDBinder 将查询蛋白质与来自 PDB 的结合和非结合蛋白质表面区域库进行比较。比较的结果用于为每个残基推导一个倾向值,该值与该残基是配体结合位点一部分的可能性相关。该方法应用于两个不同的问题:i)预测配体结合残基,ii)确定哪个表面裂缝包含结合位点。在这两种情况下,PDBinder 的表现都始终优于现有方法。PDBinder 是在一组非冗余的 1356 个高质量蛋白质-配体复合物上进行训练的,并在一组 239 个全构象和无配体复合物对上进行了测试。我们在全构象集上获得了 0.313 的 MCC 和 0.413 的 PPV,而在无配体集上,我们获得了 0.271 的 MCC 和 0.372 的 PPV。
我们表明 PDBinder 的表现优于现有方法。在未知结合位点位置的实际应用中,对未结合蛋白质的良好性能非常重要。此外,由于我们的方法与其他程序中使用的方法正交,因此可以将 PDBinder 倾向值集成到其他算法中,从而进一步提高最终性能。