Department of Medicinal Chemistry, University of Michigan, Ann Arbor, Michigan 48109-1065, USA.
J Chem Inf Model. 2013 Aug 26;53(8):1853-70. doi: 10.1021/ci400025f. Epub 2013 May 10.
The Community Structure-Activity Resource (CSAR) recently held its first blinded exercise based on data provided by Abbott, Vertex, and colleagues at the University of Michigan, Ann Arbor. A total of 20 research groups submitted results for the benchmark exercise where the goal was to compare different improvements for pose prediction, enrichment, and relative ranking of congeneric series of compounds. The exercise was built around blinded high-quality experimental data from four protein targets: LpxC, Urokinase, Chk1, and Erk2. Pose prediction proved to be the most straightforward task, and most methods were able to successfully reproduce binding poses when the crystal structure employed was co-crystallized with a ligand from the same chemical series. Multiple evaluation metrics were examined, and we found that RMSD and native contact metrics together provide a robust evaluation of the predicted poses. It was notable that most scoring functions underpredicted contacts between the hetero atoms (i.e., N, O, S, etc.) of the protein and ligand. Relative ranking was found to be the most difficult area for the methods, but many of the scoring functions were able to properly identify Urokinase actives from the inactives in the series. Lastly, we found that minimizing the protein and correcting histidine tautomeric states positively trended with low RMSD for pose prediction but minimizing the ligand negatively trended. Pregenerated ligand conformations performed better than those that were generated on the fly. Optimizing docking parameters and pretraining with the native ligand had a positive effect on the docking performance as did using restraints, substructure fitting, and shape fitting. Lastly, for both sampling and ranking scoring functions, the use of the empirical scoring function appeared to trend positively with the RMSD. Here, by combining the results of many methods, we hope to provide a statistically relevant evaluation and elucidate specific shortcomings of docking methodology for the community.
社区结构-活性资源(CSAR)最近根据雅培、Vertex 及其在密歇根大学安阿伯分校的同事提供的数据进行了首次盲测。共有 20 个研究小组提交了基准测试的结果,目标是比较不同方法在构象预测、富集和同类化合物系列的相对排序方面的改进。该测试是围绕四个蛋白质靶标(LpxC、尿激酶、Chk1 和 Erk2)的高质量实验数据进行构建的:LpxC、尿激酶、Chk1 和 Erk2。构象预测被证明是最直接的任务,当所使用的晶体结构与同一化学系列的配体共结晶时,大多数方法都能够成功地再现结合构象。测试了多种评估指标,我们发现 RMSD 和天然接触指标共同提供了对预测构象的稳健评估。值得注意的是,大多数打分函数低估了蛋白质和配体中杂原子(即 N、O、S 等)之间的相互作用。相对排序是方法最困难的领域,但许多打分函数能够正确地从系列中的非活性化合物中识别出尿激酶的活性化合物。最后,我们发现,最小化蛋白质和校正组氨酸互变异构态对构象预测的 RMSD 呈正相关,但最小化配体呈负相关。预生成的配体构象比即时生成的配体构象表现更好。优化对接参数和使用天然配体进行预训练对对接性能有积极影响,使用约束、子结构拟合和形状拟合也有积极影响。最后,对于采样和排序打分函数,使用经验打分函数似乎与 RMSD 呈正相关。在这里,我们希望通过结合许多方法的结果,为社区提供一个具有统计学意义的评估,并阐明对接方法的具体缺点。