Bannan Caitlin C, Burley Kalistyn H, Chiu Michael, Shirts Michael R, Gilson Michael K, Mobley David L
Department of Chemistry, University of California, 147 Bison Modular, Irvine, CA, 92697, USA.
Department of Pharmaceutical Sciences, University of California, 147 Bison Modular, Irvine, CA, 92697, USA.
J Comput Aided Mol Des. 2016 Nov;30(11):927-944. doi: 10.1007/s10822-016-9954-8. Epub 2016 Sep 27.
In the recent SAMPL5 challenge, participants submitted predictions for cyclohexane/water distribution coefficients for a set of 53 small molecules. Distribution coefficients (log D) replace the hydration free energies that were a central part of the past five SAMPL challenges. A wide variety of computational methods were represented by the 76 submissions from 18 participating groups. Here, we analyze submissions by a variety of error metrics and provide details for a number of reference calculations we performed. As in the SAMPL4 challenge, we assessed the ability of participants to evaluate not just their statistical uncertainty, but their model uncertainty-how well they can predict the magnitude of their model or force field error for specific predictions. Unfortunately, this remains an area where prediction and analysis need improvement. In SAMPL4 the top performing submissions achieved a root-mean-squared error (RMSE) around 1.5 kcal/mol. If we anticipate accuracy in log D predictions to be similar to the hydration free energy predictions in SAMPL4, the expected error here would be around 1.54 log units. Only a few submissions had an RMSE below 2.5 log units in their predicted log D values. However, distribution coefficients introduced complexities not present in past SAMPL challenges, including tautomer enumeration, that are likely to be important in predicting biomolecular properties of interest to drug discovery, therefore some decrease in accuracy would be expected. Overall, the SAMPL5 distribution coefficient challenge provided great insight into the importance of modeling a variety of physical effects. We believe these types of measurements will be a promising source of data for future blind challenges, especially in view of the relatively straightforward nature of the experiments and the level of insight provided.
在最近的SAMPL5挑战中,参与者提交了一组53个小分子的环己烷/水分配系数预测值。分配系数(log D)取代了过去五次SAMPL挑战中的核心部分——水合自由能。来自18个参与小组的76份提交材料代表了各种各样的计算方法。在这里,我们通过各种误差指标分析提交材料,并提供我们进行的一些参考计算的详细信息。与SAMPL4挑战一样,我们评估了参与者不仅评估其统计不确定性,而且评估其模型不确定性的能力——他们能多好地预测特定预测的模型或力场误差的大小。不幸的是,这仍然是一个预测和分析需要改进的领域。在SAMPL4中,表现最佳的提交材料的均方根误差(RMSE)约为1.5千卡/摩尔。如果我们预计log D预测的准确性与SAMPL4中的水合自由能预测相似,那么这里的预期误差约为1.54个对数单位。在预测的log D值中,只有少数提交材料的RMSE低于2.5个对数单位。然而,分配系数引入了过去SAMPL挑战中不存在的复杂性,包括互变异构体枚举,这在预测药物发现感兴趣的生物分子特性方面可能很重要,因此预计准确性会有所下降。总体而言,SAMPL5分配系数挑战让我们深入了解了对各种物理效应进行建模的重要性。我们相信,这些类型的测量将成为未来盲测的一个有前景的数据来源,特别是鉴于实验相对简单的性质以及所提供的洞察水平。