Department of Applied Mathematics and Statistics, and Institute of Chemical Biology and Drug Discovery, Stony Brook University, Stony Brook, New York 11794, USA.
J Chem Inf Model. 2010 Nov 22;50(11):1986-2000. doi: 10.1021/ci1001982. Epub 2010 Oct 29.
A database consisting of 780 ligand-receptor complexes, termed SB2010, has been derived from the Protein Databank to evaluate the accuracy of docking protocols for regenerating bound ligand conformations. The goal is to provide easily accessible community resources for development of improved procedures to aid virtual screening for ligands with a wide range of flexibilities. Three core experiments using the program DOCK, which employ rigid (RGD), fixed anchor (FAD), and flexible (FLX) protocols, were used to gauge performance by several different metrics: (1) global results, (2) ligand flexibility, (3) protein family, and (4) cross-docking. Global spectrum plots of successes and failures vs rmsd reveal well-defined inflection regions, which suggest the commonly used 2 Å criteria is a reasonable choice for defining success. Across all 780 systems, success tracks with the relative difficulty of the calculations: RGD (82.3%) > FAD (78.1%) > FLX (63.8%). In general, failures due to scoring strongly outweigh those due to sampling. Subsets of SB2010 grouped by ligand flexibility (7-or-less, 8-to-15, and 15-plus rotatable bonds) reveal that success degrades linearly for FAD and FLX protocols, in contrast to RGD, which remains constant. Despite the challenges associated with FLX anchor orientation and on-the-fly flexible growth, success rates for the 7-or-less (74.5%) and, in particular, the 8-to-15 (55.2%) subset are encouraging. Poorer results for the very flexible 15-plus set (39.3%) indicate substantial room for improvement. Family-based success appears largely independent of ligand flexibility, suggesting a strong dependence on the binding site environment. For example, zinc-containing proteins are generally problematic, despite moderately flexible ligands. Finally, representative cross-docking examples, for carbonic anhydrase, thermolysin, and neuraminidase families, show the utility of family-based analysis for rapid identification of particularly good or bad docking trends, and the type of failures involved (scoring/sampling), which will likely be of interest to researchers making specific receptor choices for virtual screening. SB2010 is available for download at http://rizzolab.org .
从蛋白质数据库中提取了一个包含 780 个配体-受体复合物的数据库,称为 SB2010,用于评估重新生成结合配体构象的对接方案的准确性。其目的是提供易于访问的社区资源,以开发改进的程序,帮助虚拟筛选具有广泛灵活性的配体。该程序使用了三个核心实验,即 DOCK,使用刚性(RGD)、固定锚(FAD)和柔性(FLX)方案,通过多种不同的指标来评估性能:(1)全局结果,(2)配体灵活性,(3)蛋白质家族,(4)交叉对接。成功率和失败率与 RMSD 的全局频谱图揭示了明确的拐点区域,这表明常用的 2 Å 标准是定义成功的合理选择。在所有 780 个系统中,成功与计算的相对难度相关:RGD(82.3%)>FAD(78.1%)>FLX(63.8%)。一般来说,由于评分强烈而导致的失败比由于采样而导致的失败要多。根据配体灵活性(7 个或更少、8 到 15 个和 15 个以上的旋转键)对 SB2010 子集进行分组,结果表明 FAD 和 FLX 方案的成功率呈线性下降,而 RGD 则保持不变。尽管 FLX 锚定方向和实时灵活生长存在挑战,但 7 个或更少(74.5%),特别是 8 到 15 个(55.2%)的成功率令人鼓舞。非常灵活的 15 个以上组(39.3%)的结果较差表明仍有很大的改进空间。基于家族的成功率似乎在很大程度上与配体灵活性无关,这表明与结合位点环境有很强的依赖性。例如,尽管配体适中,但含有锌的蛋白质通常是有问题的。最后,碳酸酐酶、胰蛋白酶和神经氨酸酶家族的代表性交叉对接示例表明,基于家族的分析对于快速识别特别好或坏的对接趋势以及涉及的失败类型(评分/采样)非常有用,这可能会引起研究人员的兴趣,他们为虚拟筛选选择特定的受体。SB2010 可在 http://rizzolab.org 下载。