Schrödinger, Inc. , 120 West 45th St., New York, New York 10036, United States.
J Chem Inf Model. 2018 Feb 26;58(2):271-286. doi: 10.1021/acs.jcim.7b00537. Epub 2018 Feb 5.
As a continuation of our work on developing a density functional theory-based pK predictor, we present conceptual improvements to our previously published shell model, which is a hierarchical organization of pK training sets and which, in principle, covers all chemical space. The improvements concern the way the studied chemical compound is associated with the data points from the training sets. By introducing a new descriptor of the local atomic environment which foregoes dependence on chemical bonding and connectivity, we are able to automatically locate molecules from the training set that are most relevant to the proton dissociation equilibrium under study. This new scheme leads to the prediction of a single pK value weighted across multiple training sets and thus patches a defect disclosed in the formulation of our previous model. Using the new parametrization approach, the pK prediction gets rid of outliers reported in previous applications of our approach, eliminates ambiguity in interpreting the results, and improves the overall accuracy. Our new treatment accounts for multiple conformations both on the level of energetics and parametrization. Illustrative results are shown for several types of chemical structures containing guanidine, amidine, amine, and phenol functional groups, and which are representative of practically important large and flexible drug-like molecules. Our method's performance is compared to the performance of other previously published pK prediction methods. Further possible improvements to the organization of the training sets and the potential application of our new local atomic descriptor to other kinds of parametrizations are discussed.
作为开发基于密度泛函理论的 pK 预测模型工作的延续,我们对之前发表的壳模型提出了概念上的改进,该模型是 pK 训练集的分层组织,原则上涵盖了所有化学空间。改进涉及将研究的化合物与训练集中的数据点相关联的方式。通过引入一种新的局部原子环境描述符,该描述符摒弃了对化学键合和连接性的依赖,我们能够自动找到与所研究的质子离解平衡最相关的训练集中的分子。这种新方案导致对来自多个训练集的单个 pK 值进行加权预测,从而弥补了我们之前模型公式中揭示的缺陷。使用新的参数化方法,pK 预测消除了之前应用我们方法时报告的异常值,消除了对结果解释的歧义,并提高了整体准确性。我们的新处理方法考虑了能量和参数化两个层面上的多种构象。我们展示了几种含有胍基、脒基、胺基和酚基官能团的化学结构的说明性结果,这些结构代表了实际重要的大型和灵活的类似药物分子。我们的方法的性能与其他先前发表的 pK 预测方法的性能进行了比较。还讨论了进一步改进训练集组织和将我们的新局部原子描述符应用于其他类型参数化的可能性。