Davies J R, Jackson R M, Mardia K V, Taylor C C
School of Mathematics and Institute of Molecular and Cellular Biology, University of Leeds, Leeds LS2 9JT, UK.
Bioinformatics. 2007 Nov 15;23(22):3001-8. doi: 10.1093/bioinformatics/btm470. Epub 2007 Sep 24.
The large-scale comparison of protein-ligand binding sites is problematic, in that measures of structural similarity are difficult to quantify and are not easily understood in terms of statistical similarity that can ultimately be related to structure and function. We present a binding site matching score the Poisson Index (PI) based upon a well-defined statistical model. PI requires only the number of matching atoms between two sites and the size of the two sites-the same information used by the Tanimoto Index (TI), a comparable and widely used measure for molecular similarity. We apply PI and TI to a previously automatically extracted set of binding sites to determine the robustness and usefulness of both scores.
We found that PI outperforms TI; moreover, site similarity is poorly defined for TI at values around the 99.5% confidence level for which PI is well defined. A difference map at this confidence level shows that PI gives much more meaningful information than TI. We show individual examples where TI fails to distinguish either a false or a true site paring in contrast to PI, which performs much better. TI cannot handle large or small sites very well, or the comparison of large and small sites, in contrast to PI that is shown to be much more robust. Despite the difficulty of determining a biological 'ground truth' for binding site similarity we conclude that PI is a suitable measure of binding site similarity and could form the basis for a binding site classification scheme comparable to existing protein domain classification schema.
蛋白质-配体结合位点的大规模比较存在问题,因为结构相似性的度量难以量化,并且从最终可与结构和功能相关的统计相似性角度不易理解。我们基于一个定义明确的统计模型提出了一种结合位点匹配分数——泊松指数(PI)。PI仅需要两个位点之间匹配原子的数量以及两个位点的大小——这与用于分子相似性的类似且广泛使用的度量标准——塔尼莫托指数(TI)所使用的信息相同。我们将PI和TI应用于先前自动提取的一组结合位点,以确定这两种分数的稳健性和实用性。
我们发现PI优于TI;此外,对于PI定义明确的约99.5%置信水平附近的值,TI的位点相似性定义不明确。在此置信水平下的差异图表明,PI比TI提供了更有意义的信息。我们展示了个别例子,其中TI无法区分假的或真的位点配对,而PI的表现要好得多。与PI相比,TI不能很好地处理大的或小的位点,也不能处理大位点和小位点的比较,而PI被证明更稳健。尽管难以确定结合位点相似性的生物学“基本事实”,但我们得出结论,PI是结合位点相似性的合适度量,并且可以构成与现有蛋白质结构域分类模式相当的结合位点分类方案的基础。