Manchester Interdisciplinary Biocentre (MIB), Manchester, Great Britain.
Phys Chem Chem Phys. 2011 Jun 21;13(23):11264-82. doi: 10.1039/c1cp20379g. Epub 2011 May 13.
The prediction of pK(a) continues to attract much attention with ongoing investigations into new ways to predict pK(a) accurately, where predicted pK(a) values deviate less than 0.50 log units from experiment. We show that a single descriptor, i.e. an ab initio bond length, can predict pK(a). The emphasis was placed on model simplicity and a demonstration that more accurate predictions emerge from single-bond-length models. A data set of 171 phenols was studied. The carbon-oxygen bond length, connecting the OH to the phenyl ring, consistently provided accurate predictions. The pK(a) of meta- and para-substituted phenols is predicted here by a single-bond-length model within 0.50 log units. However, accurate prediction of the pK(a) of ortho-substituted phenols necessitated their splitting into groups called high-correlation subsets in which the pK(a) of the compounds strongly correlated with a single bond-length. The highly compound-specific single-bond-length models produced better predictions than models constructed with more compounds and more bond lengths. Outliers were easily identified using single-bond-length models and in most cases we were able to determine the reason for the outlier discrepancy. Furthermore, the single-bond-length models showed better cross-validation statistics than the PLS models constructed using more than one bond length. For all of the single-bond-length models, RMSEE was less than 0.50. For the majority of the models, RMSEP was less than 0.50. The results support the use of multiple high-correlation subsets and a single bond-length to predict pK(a). Six one-term linear equations are listed as a starting point for the construction of a more comprehensive list covering a larger variety of compound classes.
pK(a)的预测仍然备受关注,人们一直在探索新的方法来更准确地预测 pK(a),其中预测的 pK(a)值与实验值的偏差小于 0.50 个对数单位。我们表明,单个描述符,即从头算键长,可以预测 pK(a)。重点放在模型的简单性上,并证明单键长度模型可以得出更准确的预测。研究了 171 种苯酚的数据。将 OH 连接到苯基环的碳-氧键长始终提供了准确的预测。本文通过单键长度模型预测间位和对位取代苯酚的 pK(a),在 0.50 个对数单位内。然而,对位取代苯酚的 pK(a)的准确预测需要将它们分为称为高相关子集的组,其中化合物的 pK(a)与单个键长强烈相关。高度特定于化合物的单键长度模型产生了比使用更多化合物和更多键长构建的模型更好的预测。使用单键长度模型可以轻松识别离群值,并且在大多数情况下,我们能够确定离群值差异的原因。此外,单键长度模型显示出比使用多个键长构建的 PLS 模型更好的交叉验证统计数据。对于所有单键长度模型,RMSEE 小于 0.50。对于大多数模型,RMSEP 小于 0.50。结果支持使用多个高相关子集和单个键长来预测 pK(a)。列出了六个单参数线性方程,作为构建涵盖更多化合物类别的更全面列表的起点。