Tielker Nicolas, Tomazic Daniel, Heil Jochen, Kloss Thomas, Ehrhart Sebastian, Güssregen Stefan, Schmidt K Friedemann, Kast Stefan M
Physikalische Chemie III, Technische Universität Dortmund, Otto-Hahn-Str. 4a, 44227, Dortmund, Germany.
IPhT, L'Orme des Merisiers, CEA-Saclay, 91191, Gif-sur-Yvette, France.
J Comput Aided Mol Des. 2016 Nov;30(11):1035-1044. doi: 10.1007/s10822-016-9939-7. Epub 2016 Aug 23.
We predict cyclohexane-water distribution coefficients (log D ) for drug-like molecules taken from the SAMPL5 blind prediction challenge by the "embedded cluster reference interaction site model" (EC-RISM) integral equation theory. This task involves the coupled problem of predicting both partition coefficients (log P) of neutral species between the solvents and aqueous acidity constants (pK ) in order to account for a change of protonation states. The first issue is addressed by calibrating an EC-RISM-based model for solvation free energies derived from the "Minnesota Solvation Database" (MNSOL) for both water and cyclohexane utilizing a correction based on the partial molar volume, yielding a root mean square error (RMSE) of 2.4 kcal mol for water and 0.8-0.9 kcal mol for cyclohexane depending on the parametrization. The second one is treated by employing on one hand an empirical pK model (MoKa) and, on the other hand, an EC-RISM-derived regression of published acidity constants (RMSE of 1.5 for a single model covering acids and bases). In total, at most 8 adjustable parameters are necessary (2-3 for each solvent and two for the pK ) for training solvation and acidity models. Applying the final models to the log D dataset corresponds to evaluating an independent test set comprising other, composite observables, yielding, for different cyclohexane parametrizations, 2.0-2.1 for the RMSE with the first and 2.2-2.8 with the combined first and second SAMPL5 data set batches. Notably, a pure log P model (assuming neutral species only) performs statistically similarly for these particular compounds. The nature of the approximations and possible perspectives for future developments are discussed.
我们通过“嵌入簇参考相互作用位点模型”(EC-RISM)积分方程理论预测了取自SAMPL5盲预测挑战的类药物分子的环己烷-水分配系数(log D)。该任务涉及预测中性物种在溶剂之间的分配系数(log P)和水相酸度常数(pK)的耦合问题,以考虑质子化状态的变化。第一个问题通过校准基于EC-RISM的溶剂化自由能模型来解决,该模型利用基于偏摩尔体积的校正,从“明尼苏达溶剂化数据库”(MNSOL)获取水和环己烷的溶剂化自由能,根据参数化不同,水的均方根误差(RMSE)为2.4 kcal/mol,环己烷的为0.8 - 0.9 kcal/mol。第二个问题一方面通过使用经验pK模型(MoKa),另一方面通过基于EC-RISM对已发表酸度常数的回归来处理(单个涵盖酸和碱的模型的RMSE为1.5)。总共,训练溶剂化和酸度模型最多需要8个可调参数(每种溶剂2 - 3个,pK的2个)。将最终模型应用于log D数据集相当于评估一个包含其他复合可观测量的独立测试集,对于不同的环己烷参数化,第一个SAMPL5数据集批次的RMSE为2.0 - 2.1,第一和第二个SAMPL5数据集批次组合的RMSE为2.2 - 2.8。值得注意的是,对于这些特定化合物,纯log P模型(仅假设中性物种)在统计上表现相似。讨论了近似的性质和未来发展的可能前景。