Computational and Systems Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York, NY, 10065, USA.
Tri-Institutional PhD Program in Chemical Biology, Weill Cornell Graduate School of Medical Sciences, Cornell University, New York, NY, 10065, USA.
J Comput Aided Mol Des. 2020 Apr;34(4):335-370. doi: 10.1007/s10822-020-00295-0. Epub 2020 Feb 27.
The SAMPL Challenges aim to focus the biomolecular and physical modeling community on issues that limit the accuracy of predictive modeling of protein-ligand binding for rational drug design. In the SAMPL5 log D Challenge, designed to benchmark the accuracy of methods for predicting drug-like small molecule transfer free energies from aqueous to nonpolar phases, participants found it difficult to make accurate predictions due to the complexity of protonation state issues. In the SAMPL6 log P Challenge, we asked participants to make blind predictions of the octanol-water partition coefficients of neutral species of 11 compounds and assessed how well these methods performed absent the complication of protonation state effects. This challenge builds on the SAMPL6 p[Formula: see text] Challenge, which asked participants to predict p[Formula: see text] values of a superset of the compounds considered in this log P challenge. Blind prediction sets of 91 prediction methods were collected from 27 research groups, spanning a variety of quantum mechanics (QM) or molecular mechanics (MM)-based physical methods, knowledge-based empirical methods, and mixed approaches. There was a 50% increase in the number of participating groups and a 20% increase in the number of submissions compared to the SAMPL5 log D Challenge. Overall, the accuracy of octanol-water log P predictions in SAMPL6 Challenge was higher than cyclohexane-water log D predictions in SAMPL5, likely because modeling only the neutral species was necessary for log P and several categories of method benefited from the vast amounts of experimental octanol-water log P data. There were many highly accurate methods: 10 diverse methods achieved RMSE less than 0.5 log P units. These included QM-based methods, empirical methods, and mixed methods with physical modeling supported with empirical corrections. A comparison of physical modeling methods showed that QM-based methods outperformed MM-based methods. The average RMSE of the most accurate five MM-based, QM-based, empirical, and mixed approach methods based on RMSE were 0.92 ± 0.13, 0.48 ± 0.06, 0.47 ± 0.05, and 0.50 ± 0.06, respectively.
SAMPL 挑战赛旨在将生物分子和物理建模界的注意力集中在限制蛋白质 - 配体结合预测性建模用于合理药物设计的准确性的问题上。在 SAMPL5logD 挑战赛中,旨在基准预测从水相到非极性相的药物样小分子转移自由能的方法的准确性,由于质子化状态问题的复杂性,参与者发现难以进行准确的预测。在 SAMPL6logP 挑战赛中,我们要求参与者对 11 种化合物的中性物种的辛醇 - 水分配系数进行盲目预测,并评估这些方法在没有质子化状态效应的复杂性的情况下表现如何。这项挑战是基于 SAMPL6pKa 挑战赛,要求参与者预测此 logP 挑战赛中考虑的化合物超集的 pKa 值。从 27 个研究小组中收集了 91 种预测方法的盲测集,涵盖了各种量子力学(QM)或基于分子力学(MM)的物理方法、基于知识的经验方法和混合方法。与 SAMPL5logD 挑战赛相比,参与小组的数量增加了 50%,提交的数量增加了 20%。总体而言,SAMPL6 挑战赛中辛醇 - 水 logP 预测的准确性高于 SAMPL5 中环己烷 - 水 logD 预测,这可能是因为仅对中性物种进行建模对于 logP 是必要的,并且几种方法类别受益于大量的实验辛醇 - 水 logP 数据。有许多高度准确的方法:10 种不同的方法实现了小于 0.5 logP 单位的 RMSE。这些方法包括基于 QM 的方法、经验方法和混合方法,这些方法使用经验校正支持物理建模。对物理建模方法的比较表明,基于 QM 的方法优于基于 MM 的方法。基于 RMSE 的最准确的五个 MM 基于、QM 基于、经验和混合方法的平均 RMSE 分别为 0.92±0.13、0.48±0.06、0.47±0.05 和 0.50±0.06。