Marsden Philip M, Puvanendrampillai Dushyanthan, Mitchell John B O, Glen Robert C
Unilever Centre for Molecular Science Informatics, Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge CB2 1EW, UK.
Org Biomol Chem. 2004 Nov 21;2(22):3267-73. doi: 10.1039/B409570G. Epub 2004 Sep 27.
We have investigated the performance of five well known scoring functions in predicting the binding affinities of a diverse set of 205 protein-ligand complexes with known experimental binding constants, and also on subsets of mutually similar complexes. We have found that the overall performance of the scoring functions on the diverse set is disappointing, with none of the functions achieving r(2) values above 0.32 on the whole dataset. Performance on the subsets was mixed, with four of the five functions predicting fairly well the binding affinities of 35 proteinases, but none of the functions producing any useful correlation on a set of 38 aspartic proteinases. We consider two algorithms for producing consensus scoring functions, one based on a linear combination of scores from the five individual functions and the other on averaging the rankings produced by the five functions. We find that both algorithms produce consensus functions that generally perform slightly better than the best individual scoring function on a given dataset.
我们研究了五种著名评分函数在预测205种具有已知实验结合常数的不同蛋白质-配体复合物结合亲和力方面的表现,以及在相互相似复合物子集上的表现。我们发现,评分函数在整个数据集上的总体表现令人失望,没有一个函数在整个数据集上的r(2)值高于0.32。在子集上的表现参差不齐,五个函数中有四个能较好地预测35种蛋白酶的结合亲和力,但没有一个函数能在一组38种天冬氨酸蛋白酶上产生任何有用的相关性。我们考虑了两种生成共识评分函数的算法,一种基于五个单独函数得分的线性组合,另一种基于对五个函数产生的排名进行平均。我们发现,这两种算法生成的共识函数在给定数据集上的总体表现通常略优于最佳的单个评分函数。