Jelfs Stephen, Ertl Peter, Selzer Paul
Novartis Institutes for BioMedical Research, Basel, Switzerland.
J Chem Inf Model. 2007 Mar-Apr;47(2):450-9. doi: 10.1021/ci600285n.
A pragmatic approach has been developed for the estimation of aqueous ionization constants (pKa) for druglike compounds. The method involves an algorithm that assigns ionization constants in a stepwise manner to the acidic and basic groups present in a compound. Predictions are made for each ionizable group using models derived from semiempirical quantum chemical properties and information-based descriptors. Semiempirical properties include the partial charge and electrophilic superdelocalizabilty of the atom(s) undergoing protonation or deprotonation. Importantly, the latter property has been extended to allow predictions to be made for multiprotic compounds, overcoming limitations of a previous approach described by Tehan et al. The information-based descriptions include molecular-tree structured fingerprints, based on the methodology outlined by Xing et al., with the addition of 2D substructure flags indicating the presence of other important structural features. These two classes of descriptor were found to complement one another particularly well, resulting in predictive models for a range of functional groups (including alcohols, amidines, amines, anilines, carboxylic acids, guanidines, imidazoles, imines, phenols, pyridines, and pyrimidines). A combined RMSE of 0.48 and 0.81 was obtained for the training set and an external test set compounds, respectively. The predictive models were based on compounds selected from the commercially available BioLoom database. The resultant speed and accuracy of the approach has also enabled the development of Web application on the Novartis intranet for pKa prediction.
已开发出一种实用方法来估算类药物化合物的水相电离常数(pKa)。该方法涉及一种算法,该算法以逐步方式为化合物中存在的酸性和碱性基团分配电离常数。使用从半经验量子化学性质和基于信息的描述符导出的模型对每个可电离基团进行预测。半经验性质包括进行质子化或去质子化的原子的部分电荷和亲电超离域性。重要的是,后一种性质已得到扩展,以允许对多质子化合物进行预测,克服了Tehan等人描述的先前方法的局限性。基于信息的描述包括基于Xing等人概述的方法的分子树结构指纹,并添加了二维子结构标志以指示其他重要结构特征的存在。发现这两类描述符彼此特别互补,从而产生了一系列官能团(包括醇、脒、胺、苯胺、羧酸、胍、咪唑、亚胺、酚、吡啶和嘧啶)的预测模型。训练集和外部测试集化合物的组合均方根误差(RMSE)分别为0.48和0.81。预测模型基于从可商购的BioLoom数据库中选择的化合物。该方法的速度和准确性也促成了诺华公司内联网上用于pKa预测的网络应用程序的开发。