Carlson Heather A, Smith Richard D, Damm-Ganamet Kelly L, Stuckey Jeanne A, Ahmed Aqeel, Convery Maire A, Somers Donald O, Kranz Michael, Elkins Patricia A, Cui Guanglei, Peishoff Catherine E, Lambert Millard H, Dunbar James B
Department of Medicinal Chemistry, College of Pharmacy, University of Michigan , 428 Church St., Ann Arbor, Michigan 48109-1065, United States.
Center for Structural Biology, University of Michigan , 3358E Life Sciences Institute, 210 Washtenaw Ave., Ann Arbor, Michigan 48109-2216, United States.
J Chem Inf Model. 2016 Jun 27;56(6):1063-77. doi: 10.1021/acs.jcim.5b00523. Epub 2016 May 17.
The 2014 CSAR Benchmark Exercise was the last community-wide exercise that was conducted by the group at the University of Michigan, Ann Arbor. For this event, GlaxoSmithKline (GSK) donated unpublished crystal structures and affinity data from in-house projects. Three targets were used: tRNA (m1G37) methyltransferase (TrmD), Spleen Tyrosine Kinase (SYK), and Factor Xa (FXa). A particularly strong feature of the GSK data is its large size, which lends greater statistical significance to comparisons between different methods. In Phase 1 of the CSAR 2014 Exercise, participants were given several protein-ligand complexes and asked to identify the one near-native pose from among 200 decoys provided by CSAR. Though decoys were requested by the community, we found that they complicated our analysis. We could not discern whether poor predictions were failures of the chosen method or an incompatibility between the participant's method and the setup protocol we used. This problem is inherent to decoys, and we strongly advise against their use. In Phase 2, participants had to dock and rank/score a set of small molecules given only the SMILES strings of the ligands and a protein structure with a different ligand bound. Overall, docking was a success for most participants, much better in Phase 2 than in Phase 1. However, scoring was a greater challenge. No particular approach to docking and scoring had an edge, and successful methods included empirical, knowledge-based, machine-learning, shape-fitting, and even those with solvation and entropy terms. Several groups were successful in ranking TrmD and/or SYK, but ranking FXa ligands was intractable for all participants. Methods that were able to dock well across all submitted systems include MDock,1 Glide-XP,2 PLANTS,3 Wilma,4 Gold,5 SMINA,6 Glide-XP2/PELE,7 FlexX,8 and MedusaDock.9 In fact, the submission based on Glide-XP2/PELE7 cross-docked all ligands to many crystal structures, and it was particularly impressive to see success across an ensemble of protein structures for multiple targets. For scoring/ranking, submissions that showed statistically significant achievement include MDock1 using ITScore1,10 with a flexible-ligand term,11 SMINA6 using Autodock-Vina,12,13 FlexX8 using HYDE,14 and Glide-XP2 using XP DockScore2 with and without ROCS15 shape similarity.16 Of course, these results are for only three protein targets, and many more systems need to be investigated to truly identify which approaches are more successful than others. Furthermore, our exercise is not a competition.
2014年CSAR基准测试是密歇根大学安娜堡分校的该团队开展的最后一次全社区范围的测试。针对此次活动,葛兰素史克公司(GSK)捐赠了来自内部项目的未发表的晶体结构和亲和力数据。使用了三个靶点:tRNA(m1G37)甲基转移酶(TrmD)、脾酪氨酸激酶(SYK)和凝血因子Xa(FXa)。GSK数据的一个特别突出的特点是其规模庞大,这使得不同方法之间的比较具有更大的统计显著性。在2014年CSAR测试的第一阶段,参与者得到了几个蛋白质-配体复合物,并被要求从CSAR提供的200个诱饵中识别出接近天然构象的那个。尽管社区要求提供诱饵,但我们发现它们使我们的分析变得复杂。我们无法辨别预测不佳是所选方法的失败,还是参与者的方法与我们使用的设置协议不兼容。这个问题是诱饵所固有的,我们强烈建议不要使用它们。在第二阶段,参与者仅根据配体的SMILES字符串和结合了不同配体的蛋白质结构,对接并对一组小分子进行排名/评分。总体而言,对接对大多数参与者来说是成功的,在第二阶段比在第一阶段要好得多。然而,评分是一个更大的挑战。没有哪种特定的对接和评分方法具有优势,成功的方法包括经验性的、基于知识的、机器学习的、形状拟合的,甚至包括那些带有溶剂化和熵项的方法。几个小组成功地对TrmD和/或SYK进行了排名,但对所有参与者来说,对FXa配体进行排名都很棘手。能够在所有提交的系统中都对接良好的方法包括MDock、1 Glide-XP、2 PLANTS、3 Wilma、4 Gold、5 SMINA、6 Glide-XP2/PELE、7 FlexX、8和MedusaDock。9事实上,基于Glide-XP2/PELE7的提交将所有配体交叉对接至许多晶体结构,并且在多个靶点的一组蛋白质结构中都取得成功尤其令人印象深刻。对于评分/排名,显示出具有统计学显著成果的提交包括使用ITScore1、10并带有柔性配体项的MDock1、11使用Autodock-Vina的SMINA6、12、13使用HYDE的FlexX8、14以及使用带有和不带有ROCS15形状相似性的XP DockScore2的Glide-XP2。16当然,这些结果仅针对三个蛋白质靶点,还需要研究更多的系统才能真正确定哪些方法比其他方法更成功。此外,我们的测试不是一场竞赛。