Schrödinger, Inc, 101 SW Main Street, Portland, Oregon 97204, United States.
J Chem Inf Model. 2011 Jan 24;51(1):102-4. doi: 10.1021/ci100332m. Epub 2010 Dec 7.
The recent article "Evaluation of pK(a) Estimation Methods on 211 Druglike Compounds" ( Manchester, J.; et al. J. Chem Inf. Model. 2010, 50, 565-571 ) reports poor results for the program Epik. Here, we highlight likely sources for the poor performance and describe work done to improve the performance. Running Epik in the mode intended to calculate pK(a) values for sequentially adding/removing protons, as needed to reproduce the experimental conditions, improves the root mean squared error (RMSE) from 3.0 to 2.18 for the 85 public compounds available from the paper. Despite this improvement, there are still other programs in the Manchester paper that outperform Epik. The primary reason is that the public portion of the data set is not diverse and Epik is missing a few key functional groups in this data set that are heavily represented. We show that incorporation of these missing functional groups into the Epik training set improves the RMSE for the public compounds to 1.04. Furthermore, these enhancements help improve the overall performance of Epik on a large druglike test set.
最近的一篇文章“评价 211 种药物类似物的 pK(a) 估算方法”(Manchester, J.;等人。J. Chem. Inf. Model. 2010, 50, 565-571)报告了 Epik 程序的结果不佳。在这里,我们强调了性能不佳的可能原因,并描述了为提高性能所做的工作。以所需的顺序添加/去除质子的模式运行 Epik,以重现实验条件,将 85 种可从论文中获得的公共化合物的均方根误差(RMSE)从 3.0 提高到 2.18。尽管有了这种改进,但在曼彻斯特论文中的其他程序仍优于 Epik。主要原因是数据集的公共部分不够多样化,并且 Epik 在该数据集中缺少一些关键的功能组,这些功能组的代表性很强。我们表明,将这些缺失的功能组纳入 Epik 训练集可以将公共化合物的 RMSE 提高到 1.04。此外,这些增强有助于提高 Epik 在大型药物测试集上的整体性能。