School of Biological Sciences, University of Reading, Whiteknights, Reading RG6 6AS, UK.
Bioinformatics. 2010 Nov 15;26(22):2920-1. doi: 10.1093/bioinformatics/btq543. Epub 2010 Sep 22.
We propose a novel method for scoring the accuracy of protein binding site predictions-the Binding-site Distance Test (BDT) score. Recently, the Matthews Correlation Coefficient (MCC) has been used to evaluate binding site predictions, both by developers of new methods and by the assessors for the community-wide prediction experiment-CASP8. While being a rigorous scoring method, the MCC does not take into account the actual 3D location of the predicted residues from the observed binding site. Thus, an incorrectly predicted site that is nevertheless close to the observed binding site will obtain an identical score to the same number of non-binding residues predicted at random. The MCC is somewhat affected by the subjectivity of determining observed binding residues and the ambiguity of choosing distance cutoffs. By contrast the BDT method produces continuous scores ranging between 0 and 1, relating to the distance between the predicted and observed residues. Residues predicted close to the binding site will score higher than those more distant, providing a better reflection of the true accuracy of predictions. The CASP8 function predictions were evaluated using both the MCC and BDT methods and the scores were compared. The BDT was found to strongly correlate with the MCC scores while also being less susceptible to the subjectivity of defining binding residues. We therefore suggest that this new simple score is a potentially more robust method for future evaluations of protein-ligand binding site predictions.
我们提出了一种新的方法来评估蛋白质结合位点预测的准确性——结合位点距离测试(BDT)评分。最近,马修斯相关系数(MCC)已被用于评估结合位点预测,无论是新方法的开发者还是社区范围内预测实验-CASP8 的评估者。虽然 MCC 是一种严格的评分方法,但它没有考虑到预测残基与观察到的结合位点的实际 3D 位置。因此,一个错误预测的结合位点,即使离观察到的结合位点很近,也会获得与随机预测的相同数量的非结合残基相同的分数。MCC 在一定程度上受到确定观察到的结合残基的主观性和选择距离截止值的模糊性的影响。相比之下,BDT 方法产生介于 0 和 1 之间的连续评分,与预测残基和观察残基之间的距离有关。靠近结合位点预测的残基将获得更高的分数,这更好地反映了预测的真实准确性。使用 MCC 和 BDT 方法评估了 CASP8 功能预测,并比较了得分。发现 BDT 与 MCC 得分高度相关,同时也较少受到定义结合残基的主观性的影响。因此,我们建议这种新的简单评分是未来评估蛋白质-配体结合位点预测的一种潜在更稳健的方法。